Spam and Spam Prevention Contents

What is Spam

Statistics

Legal Situation

Types of Spam

Spammer Tricks

Spam Prevention What is Spam?

Unsolicited, usually commercial, messages (such as e-mails, text messages, or Internet postings) sent to a large number of recipients or posted in a large number of places

An electronic message is "spam" if (A) the recipient's personal identity and context are irrelevant because the message is equally applicable to many other potential recipients; AND (B) the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent. Word Origin

There is some debate about the source of the term, but the generally accepted version is that it comes from the Monty Python song, "Spam spam spam spam, spam spam spam spam, lovely spam, wonderful spam". Statistics on Spam

Proportion of spam in global email traffic, 2018-2019 Statistics on Spam

Sources of spam by country

In 2018, China (11.69%) led the list of countries, swapping places with the US and consigning the former leader to second place with 9.04%.

Third position went to Germany (7.17%), which climbed into the Top 3 from sixth. Statistics on Spam

Spam email size

In 2018, the share of very small (up to 2 KB) messages increased significantly. Statistics on Spam

Malicious attachments in email

In 2018, the most widely distributed malicious objects in email, assigned the Exploit.Win32.CVE-2017-11882 verdict, exploited a Microsoft Office vulnerability for executing arbitrary code without the user’s knowledge. Statistics on Spam

Countries targeted by malicious mailshots in 2018

As in previous years, first place in 2018 went to Germany. Its share accounted for 11.51% of all attacks. Second place was taken by Russia (7.21%), and Britain (5.76%) picked up bronze. Statistics on Spam

This graphic shows the leading countries of origin for spam emails as of 2nd quarter 2017 according to Statista. Statistics on Spam

Rating of categories of organizations attacked by phishers

In 2018, the global Internet portals accounted for the lion’s share of heuristic component triggers. Statistics on Spam

Numbers of the year 2018

● The share of spam in mail traffic was 52.48%, which is 4.15 p.p. less than in 2017. Some research companies estimate that spam email makes up an even greater portion of global emails, some 73% in fact. ● About 14.5 billion spam emails are sent every single day. ● Spammers receive 1 reply for every 12,500,000 emails sent. ● The biggest source of spam in 2018 was China (11.69%). Statistics on Spam

The Worst Spammers according to

As of April 2020 the world's worst spammer countries are:

● China ● USA ● Ukraine ● Russia ● Germany Legal Situation

USA

Austria

Germany

Switzerland

Europe CAN-SPAM Act 2003, USA

The law describes specific information each email must provide including a way for recipients to unsubscribe and it spells out penalties for violations. The main requirements are:

1. Don’t use false or misleading header information. 2. Don’t use deceptive subject lines. 3. Identify the message as an ad. 4. Tell recipients where you’re located. 5. Tell recipients how to opt out of receiving future email from you. 6. Honor opt-out requests promptly. 7. Monitor what others are doing on your behalf. §107 Telekommunikationsgesetz, Austria

“ § 107 TKG (2) says that sending an electronic mail - including SMS - is not permitted without the recipient's prior consent if it is sent for direct marketing purposes. ”

For Promotional Calls & Email (incl. SMS):

● Prior consent of the participant is needed, if direct ad or >50 receiver ● Consent can be revoked ● § 6 Abs. 1 E-Commerce-Gesetz must not be violated

By making an unsolicited call, sending an unsolicited fax or sending unsolicited electronic mail, the sender commits an administrative offense and is liable to a fine of up to € 37,000 !!! §7 E-Commerce-Gesetz, Austria

§ 7 Nicht angeforderte kommerzielle Kommunikation

● since 1.1.2002 in Austria ● must be clearly recognizable as such by the user. ● The Rundfunk und Telekom Regulierungs-GmbH has to keep a list in which those persons and companies can register free of charge who have excluded the sending of commercial communication by electronic mail. § 6 Telemediengesetz, Germany

§ 6 Abs. 2 TMG 2007

“Werden kommerzielle Kommunikationen per elektronischer Post versandt, darf in der Kopf- und Betreffzeile weder der Absender noch der kommerzielle Charakter der Nachricht verschleiert oder verheimlicht werden.

Ein Verschleiern oder Verheimlichen liegt dann vor, wenn die Kopf- und Betreffzeile absichtlich so gestaltet sind, dass der Empfänger vor Einsichtnahme in den Inhalt der Kommunikation keine oder irreführende Informationen über die tatsächliche Identität des Absenders oder den kommerziellen Charakter der Nachricht erhält.” Legal Situation in Germany

● Competition Law ● Liability Law

§ 16 TMG Bußgeldvorschriften

“(1) Ordnungswidrig handelt, wer absichtlich entgegen § 6 Abs. 2 Satz 1 den Absender oder den kommerziellen Charakter der Nachricht verschleiert oder verheimlicht.

(3) Die Ordnungswidrigkeit kann mit einer Geldbuße bis zu fünfzigtausend Euro geahndet werden.” Bundesgesetz gegen den unlauteren Wettbewerb

Art. 23 UWG Unlauterer Wettbewerb

“Wer vorsätzlich unlauteren Wettbewerb nach Artikel 3, 4, 5 oder 6 begeht, wird auf Antrag mit Freiheitsstrafe bis zu drei Jahren oder Geldstrafe bestraft.” Bundesgesetz gegen den unlauteren Wettbewerb Switzerland Art. 3 Abs.1 lit. o UWG

“Massenwerbung ohne direkten Zusammenhang mit einem angeforderten Inhalt fernmeldetechnisch sendet oder solche Sendungen veranlasst und es dabei unterlässt, vorher die Einwilligung der Kunden einzuholen, den korrekten Absender anzugeben oder auf eine problemlose und kostenlose Ablehnungsmöglichkeit hinzuweisen; wer beim Verkauf von Waren, Werken oder Leistungen Kontaktinformationen von Kunden erhält und dabei auf die Ablehnungsmöglichkeit hinweist, handelt nicht unlauter, wenn er diesen Kunden ohne deren Einwilligung Massenwerbung für eigene ähnliche Waren, Werke oder Leistungen sendet.” Legal situation in Europe in General

In the rest of Europe, the legal situation is based on the directive of the European Parliament and of the Council on the processing of personal data and the protection of privacy in electronic communication which is comparable to the implementation of member states into national law:

“The sending of email advertising is only permitted if the recipient has given prior consent. The specific implementation in the respective national law differs in the respective countries.”

● RICHTLINIE 2002/58/EG DES EUROPÄISCHEN PARLAMENTS UND DES RATES - Artikel 13 - Unerbetene Nachrichten ● REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL Article 4 - Definitions (1) ‘personal data’ Types of Spam

Email

Comment Spam

Spamdexing

SPIT

Social Spam E-Mail Spam

● About 70 to 80 percent of all email traffic is spam

● Most of the emails are advertisements of various kinds

● 50 percent of all spam emails fall into the following categories: Adult content,

health, IT, personal finances, education and training in this order

● Malicious programs (also known as , evilware or junkware) are also

often via spam mails Comment Spam

● Irrelevant comments posted to a blog for the sole purpose of dropping a link to the spammer’s website. ● Spammers are less concerned with the comments themselves than with getting backlinks ● Backlinks play a central role in successful search engine optimization ● Google's perspective - > any reference to a website is a recommendation ● These spammers not only litter blogs, but also search engine results and thus significantly pollute the Internet

● Spamdexing is a word derived from “spam” and “indexing,”

● Refers to the practice of search engine spamming (a form of SEO spamming)

● Practice of creating websites that will be illegitimately indexed with a high

position in the search engines

● Common spamdexing techniques can be classified into two broad classes:

content spam and link spam SPIT

● Telephone spam that is transmitted over the Internet Protocol using IP telephony (Voice-over-IP) ● Term for unwanted telephone calls that are automated and recorded in large numbers ● Comparable to e-mail spam, but is currently less widespread and requires other protective measures ● Call machines can be used for telephone sales, , for alleged profit announcements or as lock calls Social Spam

● Social spam, is spam aimed at users of a service on the Internet, such as MySpace, , Twitter or LinkedIn. ● Users of these social network services can send messages that contain embedded links that can lead to other places on the social network or even to external sites. ● Fake accounts and fake profiles are key to social spammers -> the more followers or friends they have, the harder it is to remove them Spammer Tricks

Bayesian sneaking and poisoning

● Writing spam message so it does not contain any words that are normally used in spam messages, or “poison” the Bayesian filter’s database.

Encrypted messages

● Encrypting message where it only decrypted once it reaches the mailbox

IP address

● Borrowing or using an IP address that has a good or neutral reputation Spam Prevention

Basic Approaches

Spam Filter Categories

What is a Spam Filter?

Types of Spam Filters

Naive Bayes’ Filter Basic Approaches

● Discretion

● Keep Software up to date

● Don’t subscribe to e-mail lists with primary e-mail

● Don’t respond to spam

● Use spam filters What is a Spam Filter?

● Hard to define ● No singular spam filter for all email across all mailbox providers ● Each mailbox provider, company, and individual user can have their own version of a spam filter ● They all serve the same purpose of dissecting email and separating the good from the bad Filter Categories

Gateway spam filters

● Physical server installed at the border of a company’s network ● [Cisco’s IronPort and Barracuda]

Third party (or hosted) spam filters

● Either filters at gateway or after message is received through gateway ● [Cloudmark and MessageLabs]

Desktop spam filters

● Live on end user’s computer ● [Outlook, which uses Microsoft’s anti-spam filter SmartScreen] Types of Spam Filters

Content Filter

● A content filter can use the header information to make sure the sender isn’t on any blacklists, or look for any suspicious stops the email made, on its way to the recipient ● a content filter will check for suspicious content in the actual body of the email like trigger words or images that are consistent with spammers

Heuristic / Rule Based Filter

● A rule-based filter allows for the filtration of certain email based on a predetermined criteria. Types of Spam Filters

Bayesian Filter

● A bayesian filter is a filter that learns your spam preferences. When you mark emails as spam, the system will note the characteristics of the email and look for similar characteristics in incoming email, filtering anything that fits the formula directly in to spam for you. Types of Spam Filters

List Based Filters

● Blacklist ● A list of spam addresses ● Can be defeated by using multiple email addresses ● Whitelist ● contains non-spam email addresses. ● impractical ● Can be defeated by harvesting emails from address books. Types of Spam Filters

List Based Filters

● Greylisting is a mix between black- and whitelisting ● Email is only accepted after another delivery attempt ● Spammers will only send emails once Challenge / Response System

● Blocks undesirable emails by forcing the sender to perform a task before their message can be delivered. ● If you send an email to someone who’s using a challenge/response filter, you’ll receive a challenge ● If successful, email are sent to the recipient. If not, the message is rejected. ● The "challenge" is typically only one that a human can solve ● Might also block email newsletters subscriptions, as these messages are typically sent by automated programs. Collaborative Filters

● Collects input from email users around the globe ● Users flag incoming emails as legitimate or spam which are reported to a central database. ● The filter automatically blocks it after a threshold is reached ● A group of spammers, if large enough, could skew results DNS Lookup Systems

● Attempts to verify that the domain name of the sender exists ● If no match, the message is junk ● Reverse DNS lookup to reveal the domain name associated with the server ● Not as effective on their own ● Reverse DNS lookups might produce false positives Bayesian Filter - Naive Bayes’ Filter

Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam emails and then using Bayes' theorem to calculate a probability that an email is or is not spam.

Bayes’ Theorem Naive Bayes’ Filter - How it Works

● Assumes that the features are independent ● a filter assigns the probability that each feature is in spam. ● After training the filter, evaluate whether or not an email is spam or ham Naive Bayes’ Filter - Advantages

Can be trained on a per-user basis.

● The word probabilities evolve over time with corrective training ● Bayesian spam filtering accuracy after training is often superior to pre-defined rules. Naive Bayes’ Filter - Disadvantage

Bayesian Poisoning

● Insertion of words that are not normally associated with spam ● Words may also be transformed by spammers. For example, «Viagra» would be replaced with «Viaagra» or «V!agra» in the spam message ● Replace text with pictures, either directly included or linked Naive Bayes’ Filter Examples Naive Bayes’ Filter Examples Naive Bayes’ Filter Examples Project

Spam Classifier

● Naive Bayes ● Bag of Words model ● Train classifier using a pre-existing dataset ● Classifier decides if the text in a file is spam or ham Project - Dataset Project - Results

● The spam classifier is mostly effective for short messages ● Short messages usually mostly contain keywords ● Keywords can easily be identified as spam or ham

Project - Results Project - Results Project - Limitations

● The spam classifier is not as effective for long messages ● Long messages contain many filter words ● Filter words decrease the likelihood of an email to be classified as spam Project - Limitations

Spam as Ham Project - Limitations

Spam as Ham Project - Limitations

● If a ham message contains “spammy” phrases, it gets classified as spam ● “Spammy” phrases are usually used in sales / marketing campaigns ● If a ham message contains a link, it is likely to be classified as spam Project - Limitations

Ham as Spam Project - Potential improvements

● Use a bigger data set; this will enable more reliable predictions ● Ngram model instead of bag of words model ● Features are not independent of one another (contradicts assumption of Naive Bayes)