Mastering Spam: a Multifaceted Approach with the Spamato Spam Filter System
Total Page:16
File Type:pdf, Size:1020Kb
Research Collection Doctoral Thesis Mastering Spam: A Multifaceted Approach with the Spamato Spam Filter System Author(s): Albrecht, Keno Publication Date: 2006 Permanent Link: https://doi.org/10.3929/ethz-a-005339283 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library DISS. ETH NO. 16839 Mastering Spam A Multifaceted Approach with the Spamato Spam Filter System A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH for the degree of Doctor of Sciences presented by KENO ALBRECHT Dipl. Inf. born June 4, 1977 citizen of Germany accepted on the recommendation of Prof. Dr. Roger Wattenhofer, examiner Prof. Dr. Gordon V. Cormack, co-examiner Prof. Dr. Christof Fetzer, co-examiner 2006 Abstract Email is undoubtedly one of the most important applications used to com- municate over the Internet. Unfortunately, the email service lacks a crucial security mechanism: It is possible to send emails to arbitrary people with- out revealing one’s own identity. Additionally, sending millions of emails costs virtually nothing. Hence over the past years, these characteristics have facilitated and even boosted the formation of a new business branch that advertises products and services via unsolicited bulk emails, better known as spam. Nowadays, spam makes up more than 50% of all emails and thus has become a major vexation of the Internet experience. Although this problem has been dealt with for a long time, only little success (measured on a global scale) has been achieved so far. Fighting spam is a cat and mouse game where spammers and anti-spammers regularly beat each other with sophisticated techniques of increasing complexity. While spammers try to bypass existing spam filters, anti-spammers seek to detect and block new spamming tricks as soon as they emerge. In this dissertation, we describe the Spamato spam filter system as a mul- tifaceted approach to help regain a spam-free inbox. Since it is impossible to foresee future spam creation techniques, it is important to react quickly to their development. Spamato addresses this challenge in two ways. First, it has been designed to simplify the integration of multiple spam filters. By combining their different capabilities, a joint strike against spam promises the detection of more harmful messages than any individual solution could achieve. And second, we actively support collaborative spam filters that har- ness the collective knowledge of participating users. Such filters are therefore capable of learning about and eliminating new types of spam messages at an early stage. Hundreds of participants use Spamato everyday. The results presented in this dissertation provide insights into real-world operating environments rather than test bed scenarios often used in other projects. The results underpin our thesis that a concerted and collaborative filtering approach is an effective weapon in the arms race against spam. Zusammenfassung Das Versenden von Emails ist zweifellos eine der bedeutensten Anwendun- gen, um ¨uber das Internet zu kommunizieren. Leider weist der Email-Dienst eine entscheidende Sicherheitsl¨ucke auf, die es erm¨oglicht, Emails an beliebige Personen zu verschicken, ohne die eigene Identit¨at preiszugeben. Außerdem ist das Versenden von Nachrichten nahezu kostenlos. Diese beiden Kriterien haben in den letzten Jahren entscheidend dazu beigetragen, dass sich eine neue Branche entwickeln konnte, in der mit unerw¨unschten Werbe-Emails – besser bekannt als Spam – versucht wird, Produkte und Dienstleistungen zu vermarkten. Heutzutage macht Spam mehr als 50% aller Emails aus und ist somit zu einem der gr¨oßten Argernisse¨ im Internet-Alltag geworden. Obwohl an die- sem Problem bereits seit einiger Zeit gearbeitet wird, konnten bisher (global betrachtet) kaum sichtbare Erfolge erzielt werden. Die Spam-Bek¨ampfung stellt sich als eine Art Katz-und-Maus-Spiel dar, in dem Spammer und Anti- Spammer sich gegenseitig mit immer ausgefeilteren Methoden konfrontieren. W¨ahrend Spammer nach M¨oglichkeiten suchen, um bestehende Spamfilter zu umgehen, versuchen Anti-Spammer, neue Spam-Tricks m¨oglichst schnell zu erkennen und zu bek¨ampfen. In dieser Dissertation beschreiben wir das Spamato Spamfilter-System als einen vielseitigen Ansatz, um wieder Kontrolle ¨uber die eigene Inbox zu erlangen. Da es unm¨oglich ist, die zuk¨unftige Entwicklung neuer Spam- Nachrichten vorherzusehen, ist es umso wichtiger, diesen m¨oglichst schnell entgegenwirken zu k¨onnen. Spamato stellt sich dieser Herausforderung in zweierlei Hinsicht. Zum einen wurde Spamato so konzipiert, dass die Inte- gration mehrerer Spamfilter vereinfacht wird. Die Kombination ihrer ver- schiedenen F¨ahigkeiten erlaubt ein gemeinsames Vorgehen gegen Spam, das eine h¨ohere Erkennungsrate als die Verwendung einzelner Filter verspricht. Und zum anderen unterst¨utzen wir kollaborative Spamfilter, die das kollek- tive Wissen von Benutzern zum Einsatz bringen. Solche Filter sind daher in der Lage, neue Arten von Spam-Nachrichten fr¨uhzeitig zu erkennen und zu beseitigen. Hunderte von Benutzern verwenden Spamato regelm¨aßig. Die in dieser Arbeit aufgef¨uhrten Resultate geben einen guten Einblick in das reale Ein- satzumfeld der Spamfilter, anstatt nur Testbett-Szenarien zu beleuchten, wie es in vielen anderen Projekten der Fall ist. Die Resultate untermauern un- sere These, dass ein vereintes und kollaboratives Vorgehen gegen Spam eine effektive Maßnahme darstellt. Acknowledgements This thesis would not have been possible without the support of a mul- titude of people. First of all, I would like to thank my advisor Roger Wat- tenhofer for his help and guidance throughout my thesis. Starting as the “practice guy” in your group gave me the opportunity to learn that also real-world systems are usually backed by profound theories. Although our P2P game ideas never made it to a useful application, the encouraging and valuable discussions with you finally led to what became this thesis. I would also like to acknowledge the work of my co-examiners Christof Fet- zer and Gordon V. Cormack. Thanks for investing the time to read through the thesis and for the inspiring comments. I am very grateful to all my fellow colleagues working in the Distributed Computing Group. I know that it must have been hard to domesticate one of the last Kenos living in the wild, although you never managed to control my exceptional way of ingestion (one and a half peaches is nothing) and communication (it is art, not whistling). As a long-term member of DCG, it is really hard to keep track of all the people I “should” acknowledge in this paragraph. Maybe you are so kind to condone my lack of enthusiasm to do so. Just kidding. Of course, I would like to thank Ruedi Arnold for teaching me the weirdest Schwitzerd¨utsch dialect and for being a good and motivating companion all the time. It was a real pity when you left the building to find your way to wherever (sorry, no space for a nice picture). I would like to thank Nicola(s) Burri, who had to withstand my favorite Christmas songs 18 long months while we were working in the same office, for writing Spαmαto and fun-in-the-office (and for constantly challenging me with nastiness); Aaron Zollinger for many interesting discussions that went far beyond Swiss cantons and spam (and for the infamous Pipi-paper); Fabian Kuhn for staying longer in the office than I did (and for the insight that real genius needs beer, pepper, and blick.ch); Regina O’Dell for demonstrating that job & family can be managed with ease even if not everything seems to be perfect (and for sharing my passion on food creation, meat rulez!); Pascal von Rickenbach for being a source of inspiration for all kinds of design questions (and for leaving always early so that we did not have to talk on train); Thomas Moscibroda for giving me the opportunity to mention at least one meaningful formula + 2 τ(vs) α A(Rλ ) θv(3nβθ ) ·(2rs) in this thesis: Iλ ≤ A(D ) · 2 (and for loving the brilliant i λ 1 µθ α r α ((( − 2 ) −1) s) B-boys); and Mirjam Wattenhofer for letting me play with your children although you must have known about my bad influence on minors (and for offering me a room in your flat on my very first days in Zurich). Furthermore, I would like to extend my gratitude to Roland Flury who probably does not know how grateful everybody is about what he is doing for the group, Stefan Schmid who introduced me to Marvin, Yvonne Anne Oswald who we never saw dancing, Thomas Locher who regularly asked me about Java problems nobody else will ever experience, Remo Meier who would arguably be able to re-write Spamato in seven days, Michael Kuhn who also likes beer, and Olga Goussevskaia who left Russia and Brazil to learn Schwitzerd¨utsch. My final DCG-thanks should reach Andreas Wetzel for being the best co- admin of the Spamato project one could think of, Yves Weber for his lasting impression on our group and Spamato, and Thierry Dussuet for managing all the computer stuff much better than I ever did. I advised numerous students throughout my PhD time whom I would like to acknowledge here for their creative work, interesting discussions, and for sharing their Swiss mindset: David B¨ar, Marco Cicolini, Beatrice Huber, Micha Trautweiler, Martin Meier, Thierry B¨ucheler, Andr´e Bayer, Stephan Schneider, Bernhard St¨ahli, Charly Wilhelm, Guido Frerker, Kliment Sto- janov, Markus Egli, Daniel Hottinger, Till Kleisli, Fabio Lanfranchi, Ro- man Metz, Franziska Meyer, Lukas Oertle, and Andreas Pfenninger. Special thanks go to Gabor Cselle, your theses were probably the most interesting, challenging, and enjoyable. My thesis would not be complete if no users ran Spamato; thanks to all anonymous participants who made my thesis a real success.