Implementation of Proactive Spam Fighting Te Niques
Total Page:16
File Type:pdf, Size:1020Kb
Implementation of Proactive Spam Fighting Teniques Masterarbeit von Martin Gräßlin Rupret-Karls-Universität Heidelberg Betreuer: Prof. Dr. Gerhard Reinelt Prof. Dr. Felix Freiling 03. März 2010 Ehrenwörtlie Erklärung I versiere, dass i diese Masterarbeit selbstständig verfasst, nur die angegebenen ellen und Hilfsmiel verwendet und die Grundsätze und Empfehlungen „Verantwortung in der Wissensa“ der Universität Heidelberg beatet habe. Ort, Datum Martin Gräßlin Abstract One of the biggest allenges in global communication is to overcome the problem of unwanted emails, commonly referred to as spam. In the last years many approaes to reduce the number of spam emails have been proposed. Most of them have in common that the end-user is still required to verify the filtering results. ese approaes are reactive: before mails can be classified as spam in a reliable way, a set of similar mails have to be received. Spam fighting has to become proactive. Unwanted mails have to be bloed before they are delivered to the end-user’s mailbox. In this thesis the implementation of two proactive spam fighting teniques is discussed. e first concept, called Mail-Shake, introduces an authentication step before a sender is allowed to send emails to a new contact. Computers are unable to authenticate themselves and so all spam messages are automatically bloed. e development of this concept is discussed in this thesis. e second concept, called Spam Templates, is motivated by the fact that spam messages are generated from a common template. If we gain access to the template we are able to identify spam messages by mating the message against the template. As the template is generated from currently sent spam messages, the template will never mat a legitimate mail. In this thesis mating a mail against a template is implemented. In the scope of this thesis an evaluation for the Mail-Shake concept is provided. is evaluation shows that Mail-Shake is able to reduce the number of received spam messages and mails containing malicious soware. V Acknowledgement First of all I want to thank Professor Gerhard Reinelt and Professor Felix Freiling for making it possible for me to write this thesis at the Laboratory for Dependable Distributed Systems at the University Mannheim. I also want to thank my supervisors Jan Göbel and Philipp Trinius. eir suggestions and feedba are very mu appreciated and helped to develop the system presented in this thesis. A special thanks to all my friends and my family for testing the system and providing valuable feedba on its usability. I especially want to thank Arthur Arlt who was always willing to discuss details about the implementation and this document. I want to thank the KDE community and Qt Development Frameworks for providing su a great and coherent development framework. e KDE community has helped me improve my C++ coding skills during the last years. is was useful during the implementation as many problems were already known and could be solved easily. In general I want to thank the complete Free and Open Source community. Without their ideas of free soware it would not have been possible to realize su a project. e complete project including this document has been implemented and wrien with the help of Free or Open Source soware. Last but not least I want to thank my parents for their financial support during my Master studies so that I could concentrate on my classes. VII Contents 1 Introduction 1 1.1 Motivation ........................................ 1 1.2 Proactive Spam Fighting Teniques .......................... 2 1.3 Notes About the Implementation ............................ 3 1.4 Structure of is esis ................................. 3 2 Proactive Spam Fighting 5 2.1 Related Work ....................................... 5 2.1.1 Bayesian Filtering ................................ 5 2.1.2 DNS Blalists .................................. 6 2.1.3 URI Blalist ................................... 7 2.1.4 Greylisting .................................... 7 2.1.5 Conclusion .................................... 8 2.2 e Mail-Shake Concept ................................. 9 2.2.1 Proactive Spam Fighting With Dynamic Whitelists .............. 9 2.2.2 Limitations of the Mail-Shake Concept .................... 11 2.2.3 Summary .................................... 17 2.3 e Spam Templates Concept .............................. 18 2.3.1 Template Based Spam Mails .......................... 18 2.3.2 Generation of Templates ............................ 19 2.3.3 Proactive Filtering ................................ 20 2.3.4 Summary .................................... 21 3 Background 23 3.1 Evaluation of Current CAPTCHA Teniques ..................... 23 3.1.1 Introduction ................................... 23 3.1.2 Simple Obfuscation ............................... 24 3.1.3 Image Based CAPTCHAs ............................ 25 3.1.4 Audio Based CAPTCHAs ............................ 26 3.1.5 Image Recognition CAPTCHAs ........................ 26 3.1.6 Riddle ...................................... 27 3.1.7 reCAPTCHA .................................. 29 3.1.8 Conclusion .................................... 30 3.2 Excursus: Breaking a CAPTCHA System ........................ 32 3.2.1 e Scr.im CAPTCHA System ......................... 32 3.2.2 Flaws in the Design of the Scr.im CAPTCHA System ............. 32 3.2.3 Aa on the CAPTCHA System ....................... 34 3.2.4 Lessons Learned ................................. 36 VIII Contents 3.3 Akonadi .......................................... 37 3.3.1 Client Plugins Compared to Central Storage .................. 37 3.3.2 Akonadi as the Central Storage Solution .................... 38 3.3.3 Design of Akonadi ............................... 38 3.3.4 Summary .................................... 40 4 Development of the Systems 41 4.1 Soware Requirements for Mail-Shake ......................... 41 4.1.1 Answering Spam Messages ........................... 41 4.1.2 Delivery Status Notifications .......................... 42 4.1.3 Public Mail Address ............................... 43 4.1.4 Sending Mails .................................. 44 4.1.5 Private Mail Address .............................. 45 4.1.6 Summary .................................... 46 4.2 Design of Mail-Shake .................................. 47 4.2.1 Client Independent Library ........................... 47 4.2.2 Akonadi Agent ................................. 50 4.2.3 Client Integration ................................ 52 4.2.4 Summary .................................... 52 4.3 Implementation of Mail-Shake ............................. 54 4.3.1 Mail-Shake Library ............................... 54 4.3.2 Mail-Shake Akonadi Agent ........................... 69 4.3.3 Mail-Shake Integration in Email Clients .................... 76 4.4 Implementation of Spam Templates ........................... 81 4.4.1 Generating the RSS Feed ............................ 81 4.4.2 Testing a Mail .................................. 83 4.4.3 Summary .................................... 87 5 Evaluation 89 5.1 Mail-Shake Evaluation Setup .............................. 89 5.2 Results of Mail-Shake Evaluation ............................ 90 5.3 Greylisting ........................................ 92 5.4 Results from January ................................... 94 5.5 Results from February .................................. 96 5.6 Summary ......................................... 97 6 Retrospection and Future Tasks 99 6.1 Problems caused by Akonadi .............................. 99 6.2 Future tasks for Spam Templates ............................ 101 6.3 Future Tasks for Mail-Shake ............................... 101 6.3.1 Handling of Delivery Status Notifications ................... 102 6.3.2 Mail-Shake for Several Addresses ....................... 102 6.3.3 Solving Mail-Shake Challenges in Email Clients ............... 103 6.3.4 Integrating Mail-Shake Directly Into Email Clients .............. 103 6.4 CAPTCHA Security ................................... 104 Contents IX 7 Conclusion 105 A Examples of Delivery Status Notifications 113 A.1 RFC Compliant ...................................... 113 A.2 Exim ........................................... 114 A.3 QMail ........................................... 115 A.3.1 MIME Mail ................................... 115 A.3.2 Plain Text Mail ................................. 116 A.4 Google Mail ....................................... 117 B Mails from Automated Systems 119 B.1 Review Board ....................................... 119 B.2 Bugzilla .......................................... 119 C Mail-Shake API Documentation 121 C.1 MailShake Namespace Reference ............................ 121 C.1.1 Detailed Description .............................. 121 C.1.2 Typedef Documentation ............................ 122 C.1.3 Enumeration Type Documentation ....................... 122 C.2 MailShake::DSN Class Reference ............................ 122 C.2.1 Detailed Description .............................. 122 C.2.2 Member Function Documentation ....................... 122 C.3 MailShake::DSNPrivate Class Reference ........................ 123 C.4 MailShake::EMail Class Reference ............................ 123 C.4.1 Detailed Description .............................. 123 C.4.2 Member Function Documentation ....................... 124 C.5 MailShake::EMailPrivate Class Reference ........................ 126 C.6 MailShake::Id Class Reference .............................. 126 C.6.1 Detailed Description .............................