Implementation of Proactive Spam Fighting Teniques
Masterarbeit
von Martin Gräßlin Rupret-Karls-Universität Heidelberg
Betreuer: Prof. Dr. Gerhard Reinelt Prof. Dr. Felix Freiling
03. März 2010
Ehrenwörtlie Erklärung
I versiere, dass i diese Masterarbeit selbstständig verfasst, nur die angegebenen ellen und Hilfsmiel verwendet und die Grundsätze und Empfehlungen „Verantwortung in der Wissensa“ der Universität Heidelberg beatet habe.
Ort, Datum Martin Gräßlin
Abstract
One of the biggest allenges in global communication is to overcome the problem of unwanted emails, commonly referred to as spam. In the last years many approaes to reduce the number of spam emails have been proposed. Most of them have in common that the end-user is still required to verify the filtering results. ese approaes are reactive: before mails can be classified as spam in a reliable way, a set of similar mails have to be received. Spam fighting has to become proactive. Unwanted mails have to be bloed before they are delivered to the end-user’s mailbox. In this thesis the implementation of two proactive spam fighting teniques is discussed. e first concept, called Mail-Shake, introduces an authentication step before a sender is allowed to send emails to a new contact. Computers are unable to authenticate themselves and so all spam messages are automatically bloed. e development of this concept is discussed in this thesis. e second concept, called Spam Templates, is motivated by the fact that spam messages are generated from a common template. If we gain access to the template we are able to identify spam messages by mating the message against the template. As the template is generated from currently sent spam messages, the template will never mat a legitimate mail. In this thesis mating a mail against a template is implemented. In the scope of this thesis an evaluation for the Mail-Shake concept is provided. is evaluation shows that Mail-Shake is able to reduce the number of received spam messages and mails containing malicious soware.
V
Acknowledgement
First of all I want to thank Professor Gerhard Reinelt and Professor Felix Freiling for making it possible for me to write this thesis at the Laboratory for Dependable Distributed Systems at the University Mannheim. I also want to thank my supervisors Jan Göbel and Philipp Trinius. eir suggestions and feedba are very mu appreciated and helped to develop the system presented in this thesis. A special thanks to all my friends and my family for testing the system and providing valuable feedba on its usability. I especially want to thank Arthur Arlt who was always willing to discuss details about the implementation and this document. I want to thank the KDE community and Qt Development Frameworks for providing su a great and coherent development framework. e KDE community has helped me improve my C++ coding skills during the last years. is was useful during the implementation as many problems were already known and could be solved easily. In general I want to thank the complete Free and Open Source community. Without their ideas of free soware it would not have been possible to realize su a project. e complete project including this document has been implemented and wrien with the help of Free or Open Source soware. Last but not least I want to thank my parents for their financial support during my Master studies so that I could concentrate on my classes.
VII
Contents
1 Introduction 1 1.1 Motivation ...... 1 1.2 Proactive Spam Fighting Teniques ...... 2 1.3 Notes About the Implementation ...... 3 1.4 Structure of is esis ...... 3
2 Proactive Spam Fighting 5 2.1 Related Work ...... 5 2.1.1 Bayesian Filtering ...... 5 2.1.2 DNS Blalists ...... 6 2.1.3 URI Blalist ...... 7 2.1.4 Greylisting ...... 7 2.1.5 Conclusion ...... 8 2.2 e Mail-Shake Concept ...... 9 2.2.1 Proactive Spam Fighting With Dynamic Whitelists ...... 9 2.2.2 Limitations of the Mail-Shake Concept ...... 11 2.2.3 Summary ...... 17 2.3 e Spam Templates Concept ...... 18 2.3.1 Template Based Spam Mails ...... 18 2.3.2 Generation of Templates ...... 19 2.3.3 Proactive Filtering ...... 20 2.3.4 Summary ...... 21
3 Background 23 3.1 Evaluation of Current CAPTCHA Teniques ...... 23 3.1.1 Introduction ...... 23 3.1.2 Simple Obfuscation ...... 24 3.1.3 Image Based CAPTCHAs ...... 25 3.1.4 Audio Based CAPTCHAs ...... 26 3.1.5 Image Recognition CAPTCHAs ...... 26 3.1.6 Riddle ...... 27 3.1.7 reCAPTCHA ...... 29 3.1.8 Conclusion ...... 30 3.2 Excursus: Breaking a CAPTCHA System ...... 32 3.2.1 e Scr.im CAPTCHA System ...... 32 3.2.2 Flaws in the Design of the Scr.im CAPTCHA System ...... 32 3.2.3 Aa on the CAPTCHA System ...... 34 3.2.4 Lessons Learned ...... 36 VIII Contents
3.3 Akonadi ...... 37 3.3.1 Client Plugins Compared to Central Storage ...... 37 3.3.2 Akonadi as the Central Storage Solution ...... 38 3.3.3 Design of Akonadi ...... 38 3.3.4 Summary ...... 40
4 Development of the Systems 41 4.1 Soware Requirements for Mail-Shake ...... 41 4.1.1 Answering Spam Messages ...... 41 4.1.2 Delivery Status Notifications ...... 42 4.1.3 Public Mail Address ...... 43 4.1.4 Sending Mails ...... 44 4.1.5 Private Mail Address ...... 45 4.1.6 Summary ...... 46 4.2 Design of Mail-Shake ...... 47 4.2.1 Client Independent Library ...... 47 4.2.2 Akonadi Agent ...... 50 4.2.3 Client Integration ...... 52 4.2.4 Summary ...... 52 4.3 Implementation of Mail-Shake ...... 54 4.3.1 Mail-Shake Library ...... 54 4.3.2 Mail-Shake Akonadi Agent ...... 69 4.3.3 Mail-Shake Integration in Email Clients ...... 76 4.4 Implementation of Spam Templates ...... 81 4.4.1 Generating the RSS Feed ...... 81 4.4.2 Testing a Mail ...... 83 4.4.3 Summary ...... 87
5 Evaluation 89 5.1 Mail-Shake Evaluation Setup ...... 89 5.2 Results of Mail-Shake Evaluation ...... 90 5.3 Greylisting ...... 92 5.4 Results from January ...... 94 5.5 Results from February ...... 96 5.6 Summary ...... 97
6 Retrospection and Future Tasks 99 6.1 Problems caused by Akonadi ...... 99 6.2 Future tasks for Spam Templates ...... 101 6.3 Future Tasks for Mail-Shake ...... 101 6.3.1 Handling of Delivery Status Notifications ...... 102 6.3.2 Mail-Shake for Several Addresses ...... 102 6.3.3 Solving Mail-Shake Challenges in Email Clients ...... 103 6.3.4 Integrating Mail-Shake Directly Into Email Clients ...... 103 6.4 CAPTCHA Security ...... 104 Contents IX
7 Conclusion 105
A Examples of Delivery Status Notifications 113 A.1 RFC Compliant ...... 113 A.2 Exim ...... 114 A.3 QMail ...... 115 A.3.1 MIME Mail ...... 115 A.3.2 Plain Text Mail ...... 116 A.4 Google Mail ...... 117
B Mails from Automated Systems 119 B.1 Review Board ...... 119 B.2 Bugzilla ...... 119
C Mail-Shake API Documentation 121 C.1 MailShake Namespace Reference ...... 121 C.1.1 Detailed Description ...... 121 C.1.2 Typedef Documentation ...... 122 C.1.3 Enumeration Type Documentation ...... 122 C.2 MailShake::DSN Class Reference ...... 122 C.2.1 Detailed Description ...... 122 C.2.2 Member Function Documentation ...... 122 C.3 MailShake::DSNPrivate Class Reference ...... 123 C.4 MailShake::EMail Class Reference ...... 123 C.4.1 Detailed Description ...... 123 C.4.2 Member Function Documentation ...... 124 C.5 MailShake::EMailPrivate Class Reference ...... 126 C.6 MailShake::Id Class Reference ...... 126 C.6.1 Detailed Description ...... 126 C.6.2 Member Function Documentation ...... 126 C.7 MailShake::IdPrivate Class Reference ...... 127 C.8 MailShake::MailShake Class Reference ...... 127 C.8.1 Detailed Description ...... 128 C.8.2 Member Function Documentation ...... 128 C.9 MailShake::MailShakePrivate Class Reference ...... 131 C.9.1 Member Function Documentation ...... 131 C.9.2 Member Data Documentation ...... 132 C.10 MailShake::WhiteListEntry Class Reference ...... 132 C.10.1 Detailed Description ...... 133 C.10.2 Member Function Documentation ...... 133 C.11 MailShake::WhiteListEntryPrivate Class Reference ...... 134
D Mailman Archive Address Harvester 135 D.1 main.cpp ...... 135 D.2 mailmanharvester.h ...... 135 D.3 mailmanharvester.cpp ...... 136 X Contents
D.4 mailmanharvesterview.h ...... 137 D.5 mailmanharvesterview.cpp ...... 138 D.6 mailmanharvesterviewbase.ui ...... 139
E Automated Scr.im CAPTCHA Solver 141 E.1 main.cpp ...... 141 E.2 ScrimCraer.h ...... 141 E.3 ScrimCraer.cpp ...... 143 E.4 CMakeLists.txt ...... 147
F Dialog to Solve a Mail-Shake Challenge 149 F.1 mailshakedialog.h ...... 149 F.2 mailshakedialog.cpp ...... 150
G RSS Generator 155 G.1 main.cpp ...... 155 G.2 rssgenerator.h ...... 155 G.3 rssgenerator.cpp ...... 156 G.4 CMakeLists.txt ...... 158
H Spam Templates Library 159 H.1 template.h ...... 159 H.2 template.cpp ...... 160 H.3 templatemanager.h ...... 163 H.4 templatemanager.cpp ...... 163 H.5 mail.h ...... 164 H.6 mail.cpp ...... 165 XI
List of Figures
2.1 Overview of the Mail-Shake email process ...... 9 2.2 Example of a Mail-Shake allenge mail ...... 10 2.3 Leakage of private Mail-Shake address ...... 11 2.4 Mail-Shake authentication initiated on private address ...... 12 2.5 Mail loop triggered by a spam mail with a not valid sender address ...... 15 2.6 Web Service as rely of a mail ...... 16 2.7 Template based spamming ...... 19 2.8 Example of a generated Spam template ...... 20
3.1 Example of a reCAPTCHA ...... 25 3.2 CAPTCHA containing email address “[email protected]” ...... 25 3.3 Example of an Asirra CAPTCHA ...... 27 3.4 e words to be used for reCAPTCHA ...... 30 3.5 e scr.im CAPTCHA system ...... 33 3.6 Different images for the same scr.im CAPTCHA ...... 33 3.7 Comparison of original CAPTCHA image and the result of the pixel shader...... 35 3.8 Two different applications to handle public and private addresses ...... 37 3.9 Basic aspects of the Akonadi aritecture ...... 39 3.10 Components of Akonadi ...... 40
4.1 Abusing Mail-Shake to send spam ...... 42 4.2 Activity diagram for processing mails sent to the public address ...... 44 4.3 Activity diagram for sending mails ...... 45 4.4 Activity diagram for receiving mails on private address ...... 46 4.5 Classes EMail and DSN of the Mail-Shake library ...... 48 4.6 Class WhiteListEntry of the Mail-Shake library ...... 49 4.7 High Level Class Diagram of the Mail-Shake library ...... 49 4.8 High Level Class Diagram of Mail-Shake’s client side implementation ...... 50 4.9 Communication between Akonadi server, agent and Mail-Shake library ...... 51 4.10 Class diagram for Mail-Shake email client integration ...... 52 4.11 Classes EMail and DSN split in interface and implementation classes ...... 55 4.12 Template of a Mail-Shake allenge ...... 73 4.13 Dialogs to configure the whitelist ...... 74 4.14 Notification upon receipt of not whitelisted mail ...... 75 4.15 Mailody Message View with Mail-Shake allenge mail integration ...... 77 4.16 Dialogs to solve the Mail-Shake allenge ...... 79 4.17 Configuration for determining the mating score ...... 83
5.1 Rejected mails in December 2009 on the evaluated MTA...... 93 XII List of Figures
5.2 Rejected, bounced and junk mails in January 2010 on the evaluated MTA...... 94
6.1 Mail-Shake Agents in the systray ...... 103
D.1 Application for extracting addresses from Mailman arives ...... 140 XIII
List of Tables
4.1 Examples for subjects containing a Mail-Shake id ...... 63 4.2 Size of Mail-Shake measured in Source Lines of Code ...... 69 4.3 Database structure of Mail-Shake agent ...... 71 4.4 Mail headers used in Mail-Shake allenge and notification mails ...... 73 4.5 Changed files for Mail-Shake allenge integration in Mailody ...... 78 4.6 Command line options for the RSS generation tool ...... 81
5.1 Private and public addresses used during the Mail-Shake evaluation ...... 90 5.2 Number of mails filtered by Mail-Shake in January 2010 for the different addresses . 94 5.3 Statistics for filtered mail per address in January ...... 95 5.4 Number of mails filtered by Mail-Shake in February 2010 for the different addresses 97 5.5 Statistics for filtered mail per address in February ...... 97
XV
List of Listings
3.1 Pixel shader for extracting aracters from the scr.im CAPTCHAs ...... 35 4.1 Mating a string against a whitelist entry ...... 58 4.2 Trivial algorithm to e if a mail is whitelisted ...... 59 4.3 Comparing the whitelist entries to a given datum ...... 60 4.4 Improved algorithm to test if a mail is whitelisted based on a smarter data structure 60 4.5 Handling the receipt of a mail sent to the public address ...... 61 4.6 Generating a new unique identifier ...... 62 4.7 Extracting the Mail-Shake Id from a mail subject ...... 62 4.8 Cheing if received private mail is whitelisted or a DSN ...... 64 4.9 Cheing if mail contains a allenge response Id or is on temporary whitelist ... 65 4.10 Move an entry from temporary to permanent whitelist or create a new one...... 66 4.11 Adding a whitelist entry for ea recipient of a sent mail ...... 66 4.12 Connecting a slot to the signal with the boost library ...... 69 4.13 Slot for removing one Id from the storage ...... 70 4.14 Connecting Signals and Slots with Qt ...... 70 4.15 Feting a mail sent to the public address ...... 71 4.16 Extracting headers from a KMime message ...... 72 4.17 Extracting the Mail-Shake headers in Mailody ...... 76 4.18 Displaying Mail-Shake allenge information in Mailody’s header widget ...... 77 4.19 Intercepting a cli on a link in order to open the Mail-Shake allenge dialog ... 78 4.20 Extracting CAPTCHA from the reCAPTCHA web page ...... 79 4.21 Testing if the web page contains the revealed mail address ...... 80 4.22 Generating an RSS item from one template file ...... 82 4.23 Generated RSS feed containing one template ...... 82 4.24 Algorithm for mating a mail body against a template ...... 85 4.25 Forward and baward sear for a mating line based on fuzziness ...... 86
1
1 Introduction
1.1 Motivation
Unsolicited bulk emails or in general spam or junk mails have become one of the greatest allenges of current global communication. About 80 percent of the world’s email communication is not legitimate[10]. is includes not only spam mails but also malicious soware and phishing mails. ese mails cause a global economic loss of EUR 36 billion ea year plus EUR ten billion lost due to fraud[3]. 33 billion kWh are required to process the 62 trillion spam mails ea year and 104 billion user hours are required to e and delete these junk mails[42]. Unfortunately sending spam messages is a profitable business: in the year 2002 a study showed that out of 3.5 million sent messages 81 sales were generated in the first week of the campaign result- ing in an income of USD 1,500[48]. ese numbers can be confirmed with more recent information unleashed by a former spammer: sending 40 million mails can render a weekly income of USD 37,440[58]. Spam is also one of the reasons why there is malicious soware at all. Next to Distributed Denial of Service aas (DDoS), botnets are used to send spam mails[50]. About 10 million zombie com- puters organized in botnets are actively sending out spam and email-based malicious soware. As the zombies are added and removed dynamically to prevent static blalist solutions from bloing the zombies[9], it can be assumed that there are many more computers controlled by the bots. A single zombie of a Storm botnet sends an average of 1.04 spam mails per second up to 136,000 mails per day[15]. As long as there are enough people buying products advertised by spam or following the hyper- links in phishing mails there will be spam. It is unlikely that this social problem can be solved by soware. Of course modern web browsers can help to protect the users against fraud like phish- ing, but in the end only a beer education will prevent that people will be defrauded by spam and phishing. As well modern soware cannot protect end-users, who use outdated soware and do not care or are unable to update to a more recent and secure version. is implies that there has to be some effort to reduce the number of received junk mails and to lower the risk of being defrauded by phishing. erefore it is required that unsolicited mails are recognized in a reliable way whi does not require manual control. 2 1 Introduction
Current spam fighting teniques like Bayesian filters or Uniform Resource Identifier Blalists (URIBL), whi are discussed in Chapter 2.1, are commonly reactive. ey require a large set of received spam messages to extract features su as URIs referenced in a message. With the help of the extracted features the algorithms can distinguish spam from ham messages (valid messages). But this reactive approa has disadvantages because it must first receive the spam messages. As long as new features are not extracted, the teniques cannot identify messages as junk. is is an annoyance for users as the teniques produce false negative results and the users have to delete the unrecognized spam manually. Spam fighting has to become proactive: preventing that spam messages can be delivered to the end-users’ mailboxes at all or at least provide spam recognition solutions, whi are able to remove messages, based on new spam paerns, at the same time as the new paern is used for the first time.
1.2 Proactive Spam Fighting Techniques
In this thesis the implementation of two proactive spam fighting teniques are discussed. ese teniques aim to prevent that spam mails can be delivered to users at all and to recognize new spam faster and in a more reliable way. e first tenique, called Mail-Shake, is a concept whi prevents spam or at least makes it more difficult for spammers to send spam. erefore ea sender has to authenticate once that he is a human. Mails sent from unauthenticated senders are dropped automatically and by that spammers are unable to deliver their junk. is concept is discussed in more detail in Chapter 2.2. e second tenique helps to identify received spam mails in a more reliable way. By intercepting mails sent by a bot, generic templates are generated and used to identify spam mails even if other teniques are not yet able to recognize the email as spam. e construction and usage of Spam Templates is discussed in more detail in Chapter 2.3. e hope is that these teniques help reduce the number of spam mails received by users and the time whi is required to e for and sort false positives and negatives. e Mail-Shake concept is immune against false positives as only mails sent by computers are classified as spam mails. e Spam templates on the other hand will not mark mails sent by humans as spam because the template is constructed in a way to only mat mails sent by a bot. ese teniques can and should be used in combination with other existing spam fighting te- niques su as Greylisting, blalists and Bayes filtering systems whi are discussed briefly in Chapter 2.1. 1.3 Notes About the Implementation 3
1.3 Notes About the Implementation
e two teniques, Mail-Shake and Spam templates, are developed independently but using the same libraries and tenologies. Both applications are built upon the Personal Information Man- agement (PIM) framework developed and used by the KDE community. is framework, called Akonadi, is completely client and platform independent, whi is currently Linux (and other Unixes), Microso Windows and Mac OS X. As the underlying KDE and Qt libraries are being ported to more platforms su as smart phones, Akonadi will probably become available on those as well. Although Akonadi has been developed for the usage in KDE’s PIM suite “Kontact” it was designed with client independence in mind. So there are already different KDE applications available, whi use Akonadi, and some prototype applications developed in different programming languages and with different GUI libraries. e Akonadi framework is discussed in Chapter 3.3. e combination of platform and client independence has the advantage that the applications developed in the scope of this thesis can be used with different email clients. Nevertheless the applications are developed in a way so that its code can easily be reused by other projects to provide a more native integration. erefore an abstraction layer is implemented and used.
1.4 Structure of This Thesis
In the current Chapter a short introduction and motivation for implementing proactive spam fighting teniques was presented. e applications, whose implementation are discussed in this thesis, were named and a short introduction to the framework used to develop the applications was provided. e following Chapter 2 discusses the proactive spam fighting teniques. First of all related work, in this case other existing but reactive spam fighting teniques, is presented. is motivates the discussion of the two teniques: Mail-Shake and Spam Templates. Before the implementation can be discussed, an overview on the baground of the system is provided in Chapter 3. is includes an evaluation of current CAPTCHA¹ teniques in Chapter 3.1 required for implementing Mail-Shake and in Chapter 3.2 an example for an automated solution to break a CAPTCHA system is presented as an excursus, whi motivates the osen solution to not implement its own CAPTCHA, but to rely on existing and tested functionality. Last but not least a closer look at the KDE personal information management framework Akonadi in Chapter 3.3 completes the apter on the baground of the system. e discussion of the development of the system is encapsulated in Chapter 4. First of all the soware requirements (Chapter 4.1) are presented, followed by design (Chapter 4.2), the actual im- plementation of Mail-Shake in Chapter 4.3 and Spam templates in Chapter 4.4.
¹Completely Automated Public Turing test to tell Computers and Humans Apart 4 1 Introduction
e following Chapter 5 evaluates the results. is shows if the concepts presented in this thesis are able to reduce the number of received spam mail and if the concepts are usable at all. e implementation allows the easy reuse in different client implementations. Some possibilities for future work and a retrospection are named and presented in Chapter 6. Last but not least a conclusion for the results of this thesis are presented in the last Chapter 7. 5
2 Proactive Spam Fighting
In this Chapter the two concepts Mail-Shake and Spam Templates are discussed. Both concepts are proactive spam fighting teniques and are able to eliminate spam messages before they are shown to the end-user. is is an important difference to the existing, but reactive ones. Some of those teniques are also presented in this Chapter.
2.1 Related Work
In this Section other existing spam fighting teniques are presented. Most of those teniques are reactive and share the disadvantages of reactive approaes. A brief overview of tenologies like Bayesian filtering, blalists and greylisting are provided and their advantages and disadvantages are discussed.
2.1.1 Bayesian Filtering
e most common spam fighting teniques are the Bayesian and rule-based filtering systems as used for example by Spam Assassin¹. ese are examples for reactive spam fighting solutions: a large repository of both spam and ham messages is required to extract features from all mes- sages. ese extracted features can be used to distinguish spam from ham messages via a Bayesian model[55]. Rule-based filtering systems are reactive as well. For constructing a rule it is required to first look on the spam messages to construct the rule. Using rules for spam filtering is rather limited as the logical rule set makes binary decisions whether to classify a given mail as spam[55]. is can easily result in false positives, as seen in January 2010 when the dates grossly in the future became present for Spam Assassin[29]. e misbehaving rule tests for dates in the year 2010 or later and ea message receives an additional score between 2.075 and 3.554. As Spam Assassin classifies a message as spam at a score of 5.0 this rule causes many false positives. ese limitations of rule-based filtering systems can be circumvent by feature extraction and the use of Bayesian filtering systems. Nevertheless a Bayesian system is not the perfect solution as well. For example it can only extract features from text messages and is unable to filter image
¹http://spamassassin.apache.org/ 6 2 Proactive Spam Fighting based spam. e number of image based spam increased significantly in 2006[8] and the images are distorted by applying teniques used for CAPTCHAs, so that computers are unable to restore the original image[67].
2.1.2 DNS Blacklists
One of the most common teniques to blo spam mails directly on the mail server is the use of a DNS blalist (DNSBL). e name refers to the fact that the blalist is queried with the help of the Domain Name System (DNS). To test if a given IP address a.b.c.d is enlisted in a certain blalist the mail server just has to query for the A record for the address d.c.b.a.blalist-name[30]. If the query is successful the mail should be rejected as the sender’s IP address is known to send spam. DNSBLs are of course a reactive spam fighting approa. A given IP address has to be verified to be used for spamming. e important question is if the IP addresses of bots get listed while the bot is actively sending out spam messages. A study from 2005 shows that DNSBLs are not capable to blo spam sent by botnets. Out of 4,295 IP addresses, whi were known to be part of the Bobax botnet, only 225 were blalisted in the DNSBL provided by Spamhaus²[52]. On the other hand a blalist might easily blo legitimate senders. For example if the IP address of a bot is assigned dynamically by its Internet Service Provider (ISP), the ISP might have assigned the same IP address to a different customer at the time the DNSBL includes this address. So the actual bot is not bloed, but a legitimate user is bloed. An empirical study showed, that 80 % of the IP addresses of possible spammers in February 2004 were still listed in at least one of seven popular DNSBLs two month later. Some of the IP addresses were already present in the DNSBLs in the year 2000[30]. e fact that a DNSBL can blo any domain from sending mails is also a great disadvantage as the DNSBL can be abused. In 2007 the popular Spamhaus project demanded that the Austrian Network Information Center “nic.at“ takes down addresses used for phishing. As the registrar did not react, Spamhaus started to “blamail” the registrar by enlisting its domain, so that nic.at could not send mails anymore[47, 51]. DNSBLs seem not to be an appropriate method for spam fighting any more. e reactive approa is unable to scope with the frequently anging IP addresses of spam sending bots and the ances that legitimate senders are bloed is too high. Especially the incident between Spamhaus and nic.at illustrate that the disadvantages of DNSBLs prevail.
²http://www.spamhaus.org/ 2.1 Related Work 7
2.1.3 URI Blacklist
A different form of blalists are the Uniform Resource Identifier Blalists (URIBL). Instead of bla- listing the IP address of senders, domain names referenced in mail bodies more oen than a given threshold are included in the blalist. A given mail is analyzed if it contains an URI to su a blalisted domain name and in that case the mail is classified as spam[33]. In opposite to the DNSBLs the complete mail has to be received and the content has to be analyzed. e approa is reactive and requires a large set of both spam and ham messages as the presence of an URL in the message body is not a reliable indicator for spam. Almost 90 % of legitimate mails contain URLs as well[33]. An advantage of URIBLs compared to other teniques is, that it only analyzes the URLs in the message body. On the other hand the approa easily produces false negative results as it requires the presence of an URL. If a spam message does not contain an URL, as it is for example image spam, the message cannot be classified as spam. Given the fact that URIBLs produce false negative results, it cannot be used as an own spam fighting solution, but has to be combined with other teniques. So the fact that a mail has been classified as spam by using an URIBL should only be seen as an indicator for spam.
2.1.4 Greylisting
Greylisting is a combination of a bla- and a whitelist with automatic whitelist management. Ea new received mail is initially rejected on the Mail Transfer Agent (MTA) and the unique triplet of IP address of sending host, sender address and recipient address in the envelope is stored. If the sending host tries to deliver the mail again aer a defined delay, the mail is accepted and the triplet is moved to the whitelist. All further communication from this triplet will not be delayed[26]. Greylisting is based on the assumption that spam sending applications are using a “fire-and- forget” approa. If a spam message cannot be delivered the application does not try to resend the message, although temporary failures are always possible. e first testing of greylisting in mid-2003 showed an effectiveness of 95 %[26]. Unfortunately this effectiveness is based on the fact that spam sending applications do not implement SMTP correctly. By adopting the spam sending applications to circumvent the protection provided by greylisting, the success rate can be decreased. In Chapter 5.3 on page 92 an evaluation of the current effectiveness of greylisting is provided. e greylisting approa has some disadvantages. First of all ea legitimate mail is delayed if a sender tries to deliver a mail for the first time. In case that several MTAs are used to send outgoing mails, it is possible that ea mail is delayed as the IP address stored in the unique triplet anges for ea mail. It is even possible that legitimate mails are bloed completely because the hosts do not retry to send the mail or handle the temporary failure as a permanent and return the mail to the end-user[37]. 8 2 Proactive Spam Fighting
Even when greylisting breaks because spammers adopt their used applications, it is useful to continue to use greylisting. Basically greylisting bounds resources on the spammer’s side. e spammer has to use a mail queue and cannot continue to use a fire-and-forget approa. Due to the fact that the host’s IP address is part of the unique triplet, the same bot has to send the message aer the delay. ere is the ance that at this time reactive approaes as for example DNSBL include the bot’s IP address in the blalists and so the spam message can be bloed, although the greylist is overcome.
2.1.5 Conclusion
As this Section illustrated none of the presented existing teniques is able to reliably distinguish spam from ham messages. Most of the existing teniques are reactive and require that first a large set of false negative results is generated. Based on these false negative results the teniques can be improved to identify spam messages in future. But this is of course an annoyance for the end user as unfiltered messages appear in the mailbox and has to filter those manually. A more proactive spam fighting approa is required. Spam messages have to be identified before the messages hit the end-users mailbox. Greylisting is a step in the right direction as it blos spam- mers, but it is only a solution till it is commonly used, as the spammers will adjust their soware. 2.2 e Mail-Shake Concept 9
2.2 The Mail-Shake Concept
In this Section the Mail-Shake concept as described in [19] is discussed. First of all the idea is pre- sented followed by a discussion how and why the concept works and finally some of the limitations will be named and how to circumvent these.
2.2.1 Proactive Spam Fighting With Dynamic Whitelists
e basic idea behind Mail-Shake is to blo all mails from unauthenticated senders and to provide senders an easy way to authenticate themselves. e process of authentication is done in a way so that humans are able to participate, while computers - and by that spam bots - are not. Aer authentication the sender’s address is put on a whitelist. is whitelist is used by Mail-Shake to decide if a mail is authorized or not. By that the concept is proactive as it blos spam before it is read by the user. .
.send initial email .User A (private address) .(recipient placed on whitelist)
.User B (public address) .reply with allenge .(and random ID)
.resend initial email .User A (private address) .(and ID in subject) .(update whitelist entry)
.User B (private. address) .future communication .(address placed on whitelist) .User A (private address)
.User B (private address)
Figure 2.1: Overview of the Mail-Shake email process[19]
ese initial steps of authentication are illustrated in Figure 2.1. A sender (User A) has to send a mail to User B’s public email address. All mails sent to the public address are discarded, but answered with a allenge mail containing a unique identifier. User A has to solve the allenge, 10 2 Proactive Spam Fighting whi reveals User B’s private address. Now User A can resend the original mail with the identifier in the subject. Mail-Shake compares the identifier and put User A’s address on a whitelist. In future User A can send mails directly to User B’s private address. e authentication step is required only once. As well there is no need to include the identifier in ea single mail. Other mails sent to the private address are discarded if the sender address is not on the whitelist. e allenge, whi reveals the private email address, has to be in a way that it is solvable by a human and not by a computer - that is a kind of a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). An example for su a allenge mail is presented in Figure 2.2. e actual allenge is implemented by relying on the reCAPTCHA web service, whi offers the possibility to protect an email address with a CAPTCHA. A Mail-Shake user can publish the public address openly in the web. If a spam bot gains this address, all spam mails sent to the public address are not read and the spam bot is unable to gain the private address from the Mail- Shake allenge mails.
Subject: Mail-Shake challenge You sent an email to a public Mail-Shake address. The email will not be delivered. You have to send the email to the private address. You can retrieve the private address by visiting the following web address and solving the shown CAPTCHA: http://mailhide.recaptcha.net/d?k=01CnVIbRzbs1dYsDFRJi_3RQ==&c=6pdRWDUzBNLbqFDUM-P8vMAb9FJMDoP3HqWDsEQZPoI=
The email to the private address will only be delivered if you include the following text in the subject of your email:
Mail-Shake Id: 37735
In future you can send emails directly to the private email address as normal. If you did not send an email to the public address you can ignore this email.
Figure 2.2: Example of a Mail-Shake allenge mail
If a spam bot gains the private email address, the bot is still unable to send junk to the Mail-Shake user. e address used in the spam mails are not on the whitelist and the mails are dropped. So a spam bot is unable to send spam mails without going the authentication steps. As it might be possible that a spam bot gains both public and private address, the unique identifier is introduced. Without the identifier it would be sufficient to just send a mail to the public followed by sending a mail to the private address. Introducing the unique identifier ensures that a Mail-Shake allenge mail has to be received. By that all spam bots using forged sender addresses are unable to receive the required unique identifier. e concept discussed so far is unusable in case that two Mail-Shake users try to communicate with ea other. User A tries to send a mail to User B’s public address. As with all outgoing com- munication, User A sends the mail from his private address. User B’s Mail-Shake setup therefore sends the allenge mail to a private address. As there has not been any communication before, 2.2 e Mail-Shake Concept 11
the allenge mail is dropped. To circumvent this problem, a temporary whitelist is introduced. Whenever a Mail-Shake user sends a mail the recipient address is added to the temporary whitelist. A response mail is therefore not bloed and the allenge mail can be received. e temporary whitelist also guarantees that all communication started by a Mail-Shake user is possible. Mail-Shake makes it more difficult for spammers to deliver their junk. It requires to solve one CAPTCHA for ea recipient address. at increases the costs to send spam as long as the CAPTCHA is secure and has to be solved by a human. Even if a spammer knows both private and public address, he needs to be able to receive the allenge mail to get on the whitelist. e user will most likely delete the entry from the whitelist and so for ea spam mail to be sent, the spammer has to send two mails and receive one. Nevertheless there is one possible way to circumvent the Mail-Shake protection by infecting a system with malware and sending spam to all gathered addresses on this system disguised as the valid sender. In that case a Mail-Shake user will remove the address and Mail-Shake should inform the sender as it is a strong indicator that the system is infected.
2.2.2 Limitations of the Mail-Shake Concept
2.2.2.1 Leaking of Private Address
To assume that the private address will not be leaked is rather naive. If User A has a Mail-Shake protected setup and User B is authorized to send mails to User A’s private address, the address leaks if User B sends one mail to User A and a third User C. In case that User C answers to both User B and User A the mail is dropped as illustrated in Figure 2.3(a). Another scenario for leakage is, that the Mail-Shake User A sends a mail to User B, who is not . authorized.. Mail-Shake adds User B’s address to the temporary whitelist automatically, so that response mails are accepted. In this scenario User B does not even know that User A is using Mail- Shake. By forwarding User A’s mail to User C the private address is leaked. If User C tries to send
.User A .User B .User C .User A .User B .User C .1 .1 .1 .2
. . . . .private .private .address .2 .address .3 . . . . (a) 1.: User B sends one mail to both the private ad- (b) 1.: User A sends mail to User B and adds B’s ad- dress of User A and to User C. 2.: User C sends a dress to his temporary whitelist. 2.: User B forwards mail to User A, whi is dropped as his address is mail to User C. 3.: User C sends mail to User A, whi not on the whitelist. is dropped as his address is not on the whitelist.
Figure 2.3: Leakage of private Mail-Shake address 12 2 Proactive Spam Fighting
. .Sender .Receiver . .1 . .2 .1. Sends mail to private address .2. Drops not whitelisted mail .3 .private address .3. Sends notification mail with .4 .allenge for public address .5 .4. Solves allenge, . .6 .receives public address .5. Sends mail to public address .7 .6. Generates Id and adds it to storage .public address .Storage .7. Sends regular allenge mail .8 .8. Resends original mail .9 .with Id in subject .9. Updates the whitelist .private address
Figure 2.4: Mail-Shake authentication initiated on private address
a mail to User A the mail is dropped. is scenario is illustrated in Figure 2.3(b). Given these two scenarios it is unacceptable to drop mails without any further notice to neither receiver nor sender as proposed in the Mail-Shake concept paper. While this is an annoyance for private mail communications, it might have legal consequences in corporate or governmental in- stitutions. Also a financial loss is possible if for example orders are sent to a Mail-Shake protected private address. Of course a notice to the user of Mail-Shake (User A) does not make sense as it requires to manually e all filtered mails and in that case Mail-Shake does not provide any advantages in comparison to existing solutions. erefore the sender (User B) has to be notified, that the mail has been dropped. is notification must offer a way to send mails to User A. Publishing User A’s public address in the notification is no option as then both public and private address are known to User B. So he only needs to generate a unique identifier by sending a mail to the public address in order to send mails to the private address in future. is implies that he does not have to authenticate himself as a human. erefore the notification must include the public address as a allenge. e workflow for authentication in this scenario is illustrated in Figure 2.4. e allenge reveals User A’s public address and User B has to send a mail to the public address in order to generate an identifier. is identifier has to send with the original mail to the already known private address. is workflow requires one more mail to be sent and makes the complete process more complex. Another approa to solve this limitation is to include the unique identifier in the notification and to add the sender’s address to the whitelist when a mail with the correct Id is sent to the public address. is approa has not been pursued as mails to the public address are not delivered to the end-user. While it allows to add an address to the whitelist, the sender has to resend the original 2.2 e Mail-Shake Concept 13 mail to the private address nevertheless. As well there is the ance that a user expects the public address to be the private one if he knows the Mail-Shake concept.
2.2.2.2 Communication with Automated Systems
ere is a limitation in the usability in conjunction with all kinds of communication with automated systems, su as mailing lists, bulletin boards and online shops. e idea behind Mail-Shake is to only accept mails from senders, who authenticate themselves as humans. In case of su automated systems we want to receive mails although the sender is not a human. e way to authenticate an address does not work for e.g. newsleers. In most cases an automated system does not accept mails at all and even if it does, there is nobody solving the allenge and anging the address. So the usage of the public address is out of bounds for communication with automated systems. is implies that every time when mails from an automated system are expected (e.g. generating a new account in an online shop) the private address has to be specified. At this point ea mail sent by the automated system is dropped as the whitelist does not yet contain an entry for the address. It is the user’s obligation to add the address manually. Unfortunately the address, whi is used by the system, is in general unknown to the user. A possible solution to this problem is to ignore it. In case that not whitelisted mails are not deleted and just moved to a different mail folder, the user can e this folder for the mail. Of course this is a very suboptimal solution as the user has to look through a folder full of junk mails. As well this is no solution in case mails are automatically deleted as the original approa suggested. In [19] a proposed solution to this limitation is to allow wildcards in whitelist entries. at is in case the user created an account at the online shop “Foo”, he can manually create an entry that mates “*foo*”. Of course this does not only mat mails sent by the shop, but also junk mail disguised as mails sent from this shop. So either the user has to ange the entry as soon as the first mail sent from this shop has been received or specify a more specific rule like “*@foo.de”. But this would not mat for example mails sent from addresses like “[email protected]”. Adding more wildcards to the domain part of the address does not solve this problem as it opens the door to spammers using domains like “foo.bar.de”, whi would be mated by an entry like “*@*foo*”. e implementation, whi is discussed later in this thesis, supports these two possibilities to circumvent this limitation of communication with automated systems. But it also supports a third one: instead of dropping the mail directly a processing delay is introduced. e user receives a notification in the user interface, that a mail will be dropped and he can add the mail to the whitelist before being dropped. To provide beer usability the notification can be toggled on and off. So before a user registers himself on a web shop he can activate the notifications, wait till the first mail of this automated system is received and turn off the notifications. Waiting for the first received mail allows to whitelist exactly the domain used by the automated 14 2 Proactive Spam Fighting system, so that the entry should not mat junk mails using parts of the domain name in their sender address. Unfortunately the assumption that all mails from one web shop will use the domain used in the first mail does not hold as the evaluation (see Chapter 5.2 on page 90) showed. If an online shop Foo is a subsidiary of company, Baz the registration mail might be sent from domain “foo.de” while further communication is sent from “baz.de”. In su a case a false positive is generated. On the other hand this proofs that the Mail-Shake concept works correctly.
2.2.2.3 Sending Notifications in Reply to Spam
A limitation not considered at all in the Mail-Shake paper[19] are Mail-Shake allenges or notifi- cations in reply to the receipt of spam mails. In case that the sender address is valid, but forged, a allenge is sent to a user who did not request it. If the user is using Mail-Shake himself the al- lenge is dropped without bothering the user. In case he is not using Mail-Shake he receives one or in worse cases many unwanted notification. ese bascaered mails are of course unwanted and can even be considered as spam. e consequences could be that rule based spam fighting teniques are trained to filter out Mail-Shake allenge or that the MTA³ sending the allenges is set on a blalist. is would mean that a user of Mail-Shake is either unable to send mails or that users who want to send him mails are unable to go through the Mail-Shake authentication as the allenge mails are dropped automatically. In case that the sender address of a spam mail is forged, but not valid, the situation is slightly different. Mail-Shake tries to send a allenge to this address, but this cannot succeed as the address does not exist. e MTA responses with a delivery status notification mail informing the end- user that the mail delivery failed. is notification is sent to the sender address of the Mail-Shake allenge, whi is the public address. Of course Mail-Shake generates another allenge sent to the address of the MTA. In case that the MTA does not accept mails another delivery status notification is sent to the public address. At that point Mail-Shake is caught in a mail sending loop as illustrated in Figure 2.5. Ea delivery status notification sent by the MTA causes another allenge mail, ea allenge mail causes another delivery status notification. If notifications are sent in reply to mails received at the private address, the situation becomes more complex. Of course a mail loop can be triggered as well. A problem is that delivery status notifications in general may not be dropped automatically as they might be a valid mail in case a mail sent by the user could not be delivered. erefore Mail-Shake must be able to distinguish delivery status notifications sent in reply to a Mail-Shake mail from those in reply to a user mail. With RFC 3464[44] a specification for the format of delivery status notifications (DSN) exists. Unfortunately not all MTAs implement this specification, although the first version (RFC 1894) was published in 1996. During the evaluation (compare Chapter 5) non compliant delivery status no-
³Mail Transfer Agent 2.2 e Mail-Shake Concept 15
. .spam email .(with non valid sender address) .Spam Bot
.Mail-Shake user .Challenge email
.undelivered .Challenge email
.MTA .Mail-Shake user .Delivery Status Notification
Figure 2.5: Mail loop triggered by a spam mail with a not valid sender address
tifications sent from Exim, qmail and the MTA of Google Mail have been received. While the notifications sent by the first one offer a minimal ance to be recognized as a notification, the laer ones do not. e notifications are normal plain text mails with the original, undelivered mail pasted into the text body. A compliant DSN uses a special MIME ( Multipurpose Internet Mail Extensions) type⁴ and provides the undelivered mail as an aament. Appendix A contains examples for both compliant and non-compliant delivery status notifications received during the evaluation. Mail-Shake has to be able to recognize a DSN and not send allenges or notifications in reply to the receipt of a DSN. Furthermore at the private address Mail-Shake has to only drop DSNs in reply to Mail-Shake notifications. e only way to recognize if a DSN is in reply to a Mail-Shake allenge is the aaed undelivered mail whi is specified as optional in RFC 3464. While this is in general positive as it blos bascaered spam, for Mail-Shake it is a problem. Fortunately the evaluation showed that all standard compliant DSNs either aa the complete mail or at least the header, whi is sufficient to recognize a Mail-Shake mail. In case of non compliant notifications su as the one sent by Exim there is only the oice to either drop all notifications or to allow all notifications. So to say the oice between false positives or false negatives. e handling of delivery status notifications as proposed in this Section weakens the Mail-Shake concept. It is possible to successfully send mails to the private address without the requirement to authenticate. A spammer would only have to disguise the spam as a DSN. In case of a standard
⁴multipart/report 16 2 Proactive Spam Fighting
. .User of .Web Service .Web Service .Mail-Shake User . .1 .2
.1. User sends message via Web Service .3 .2. Web Service relys message as mail .3. Mail-Shake discards message
Figure 2.6: Web Service as rely of a mail
compliant delivery status notification there is at least the ance that with extensions to the email client su mails can be recognized.
2.2.2.4 Web Services
Another limitation in the usability of Mail-Shake, whi can be considered as a variant of the com- munication with automated systems, was found during the evaluation: Mail-Shake is unable to handle mails sent from web services su as social networking services. Consider the case that User A is using Mail-Shake and User B is an authorized sender and knows the private address of User A. User B is also a user of social network Foo, while User A is not a user of that network. User B wants to invite User A to join that network. erefore he gives User A’s private address to Foo and Foo sends an invitation mail to User A. is mail is of course dropped as it uses a not whitelisted address of Foo and not the whitelisted mail of User B. e web service is so to say a mail rely whi anges the sender address as illustrated in Figure 2.6. In case User B specifies User A’s public address it fails as well, as the allenge is sent to Foo and as this is an automated system it cannot solve the allenge. In fact the evaluation showed that a bounce mail might be sent stating that you cannot send mails to that address. As this mail triggers another allenge mail, Mail-Shake and Foo are stu in a mail sending loop similar to the one seen above in the case of DSNs. In fact Mail-Shake behaves exactly the way the user expects it to work. Although the private address leaked, it is useless for the social network. e user’s privacy is still protected by Mail- Shake. e way to invite someone to a social network or a similar web service should be done by using existing communication annels and not to give private information, su as email addresses, to a third party. By that Mail-Shake does not only prevent spam but also protects the user’s privacy. is limitation does not only occur for social networks, but for all cases where a web service forwards a mail or sends an invitation to the service. So for example invitation mails for services like “Google Wave”, are bloed as well. To solve this limitation Mail-Shake could ship predefined whitelist entries to allow mails sent from su known services. Of course there must be a way to 2.2 e Mail-Shake Concept 17 update this predefined whitelist for the case that new services are established or addresses ange. Also the case of purases via the popular auction platform eBay fail as the seller sends a mail to the buyer. e seller’s address is of course unknown to the buyer and the address is in that case not whitelisted. ere might be a workaround to wat for mails at the time the purase finishes or to only use the web frontend provided by the platform. A similar problem occurs for Review Board⁵, a web-based code reviewing tool used for example by the KDE community. e web tool knows the addresses of all participants and if User A opens a review request to User B, a mail is sent to User B from User A’s address. e header section does not contain any information, whi could be used to identify the mail as been sent from Review Board. In Appendix B an example of su a header section is provided and one from a system with useful headers. As Review Board is open source soware the easiest solution is to propose a pat to include a special header in ea mail.
2.2.3 Summary
In this Section the Mail-Shake concept has been presented. Mail-Shake protects an email account by using whitelists. All mails with a sender address, whi is not on the whitelist, are bloed. To get an address on the whitelist a sender has to proof that he is a human and not an automated system by solving a CAPTCHA. For the authentication process ea user of Mail-Shake has two addresses: a public and a private one. Ea mail to the public address is answered with a mail containing the CAPTCHA, whi reveals the private address. e concept is of course not bullet proof and some limitations of the concept and possible solutions to those were presented. e most severe problems are communication with automated systems and handling of Delivery Status Notifications. e solutions to these limitations are discussed in more detail in the scope of this thesis.
⁵http://www.reviewboard.org/ 18 2 Proactive Spam Fighting
2.3 The Spam Templates Concept
In this Section the concept of proactive spam filtering based on templates, as described in [25] is presented. e general idea is to generate templates by intercepting mails sent by spam bots. ese templates are used to identify new received mails as junk by mating the mail against the templates.
2.3.1 Template Based Spam Mails
Nowadays spam mails are mostly sent by botnets whi control a large number of systems infected with malicious soware (malware). ese controlled systems, whi are commonly known as bots or zombies, communicate with a control server to get the order to send out spam. Most large spam sending botnets, like the Storm Worm botnet and its successor the Waledac botnet[60], use a special tenique to generate and send spam messages[59] as illustrated in Figure 2.7. e control server passes templates, whi describes the structure of the spam messages to be sent, and meta-data su as recipient lists to the bots. e templates contain variable parts, whi are filled by the bots when sending out the messages with for example URLs received from the control server as well[34]. By intercepting the communication between the bots and the mail servers, they connect to for sending the spam messages, the templates can be reverse engineered. To intercept the communica- tion, probes of malware are executed in a sandbox, a controlled environment. e malware, whi is running on a native Microso Windows maine, is allowed to communicate with its control server to receive current templates and the list of target recipients. When the bot tries to start a SMTP⁶ connection, the connection is intercepted and redirected to a local mail server. e local mail server is the man-in-the-middle between bot and the mail server the bot wanted to connect to. To tri the bot into believing that it is communicating with the actual mail server, the local one has to connect to the “real” MTA and grab the banner and reply it to the bot. As the original template is passed from the control server to the bots, it seems to be a more elegant solution to intercept this communication, instead of intercepting the SMTP communication (and to reverse engineer the template). But gaining the original template might not be useful. Ea botnet uses its own template language, whi can be, as for example the Storm botnet illustrates, a fairly elaborate template language with support for formaing macros, generation of random numbers, dates, etc.[34] ese languages have to be reverse engineered and adjusted ea time the botnet slightly anges the language, whi renders the idea of spam fighting based on spam templates reactive. Another reason to reverse engineer the templates is, that not all botnets distribute their templates to the bots. ere are also botnets using a reverse proxy-based spamming tenique[49]. e bot connects to the control server and establishes a reverse SOCKS proxy connection. e control server uses this tunnel to directly send out the spam messages without passing the template
⁶Simple Mail Transfer Protocol 2.3 e Spam Templates Concept 19
Figure 2.7: Template based spamming[25] to the bot. By intercepting all SMTP communication only current spam messages are gathered. is has the advantage that when a new spam campaign is started the mails are already present. Existing teniques whi rely on feature extraction first have to gather many spam mails to be able to identify a new campaign resulting in a high false negative rate at the start of a new spam campaign. e idea of the Spam templates concept is, to generate the templates used by the bots. erefore a bot is executed for a certain amount of time or till a certain number of messages have been collected. Aerwards the system is reset and a different probe of malware is executed to receive messages sent by another botnet.
2.3.2 Generation of Templates
Aer one bot has been executed, templates can be reverse engineered from the collected spam mails. e messages are sorted, so that the longest message is processed first. e longest mail becomes the base template and by merging it with the other mails a template is generated. If the merge with one mail results in a too generic template, the new one is discarded and the mail is moved ba to the list of unprocessed mails. at guarantees that the template does not become too generic and only mails whi were generated from the original template end up in the template. As soon as all mails are processed or only mails are le whi render the template too generic, the template generation process ends[25]. An example for a generated template is provided in Figure 2.8. Subject, X-Mailer header and ea line of the message body are replaced by a regular expression. A disadvantage of this template generation process is, that the template can be too specific if 20 2 Proactive Spam Fighting
Subject\:\ ([\!\-\.\’\s\w]){7,137}\ X\-Mailer\:\ Microsoft\ Outlook\ Express\ 6\.00\.2720\.3000\ Body\:\ \#([\=\.\-\&\;\!\’\s\w]){20,152}\!\!\>\>\=09\ \.([A-Za-z]){14,14}Next\ Body\ Part\:\ \<\!DOCTYPE\ HTML\ PUBLIC\ \"\-\/\/W3C\/\/DTD\ HTML\ 4\.0\ Transitional\/\/EN\"\>\ \\
\ \html\;\ \=\ charset\=3Diso\-8859\-1\"\>\ \\ \