Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers

Brigham Young University BYU ScholarsArchive Theses and Dissertations 2012-03-13 Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers Alberto Trevino Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Information Security Commons BYU ScholarsArchive Citation Trevino, Alberto, "Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers" (2012). Theses and Dissertations. 3103. https://scholarsarchive.byu.edu/etd/3103 This Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. TITLE PAGE Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers Alberto Treviño A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science J. J. Ekstrom, Chair Dale C. Rowe Eric K. Ringger School of Technology Brigham Young University April 2012 Copyright © 2012 Alberto Treviño All Rights Reserved ABSTRACT Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers Alberto Treviño School of Technology, BYU Master of Science The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today’s unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails. Keywords: Alberto Trevino, email, spam filtering, phish, phishing attacks, support vector machines, maximum entropy, naive bayes, bayesian filters ACKNOWLEDGMENTS Special thanks go to my wife and my children. Their support and the desire to provide a better life for them were strong motivators when the work got difficult. I give another special thanks to my wife for putting up with me and doing everything I neglected to do in my family to write this thesis. A special thanks goes to my parents who expected great things from me. Finally, a very respectful and special thanks goes to Dr. Ekstrom for believing in my potential and for fighting for my admission into the program. I also thank my two other committee members for their continual support and their ideas to make this thesis worthy of acceptance. TABLE OF CONTENTS List of Tables...............................................................................................................................viii List of Figures................................................................................................................................ix 1 Introduction.................................................................................................................................1 1.1 Background...........................................................................................................................1 1.2 Problem Statement................................................................................................................2 1.3 Hypothesis.............................................................................................................................3 1.4 Justification...........................................................................................................................4 1.5 Assumptions..........................................................................................................................4 1.6 Delimitations.........................................................................................................................4 2 Review of the Literature.............................................................................................................6 2.1 Introduction to the Literature................................................................................................6 2.2 Understanding Email Communications................................................................................7 2.2.1 Overall Delivery Process...............................................................................................7 2.2.2The SMTP Session.........................................................................................................9 2.2.3 Sending Spam..............................................................................................................13 2.3 History of Spam..................................................................................................................14 2.4 Types of Spam.....................................................................................................................15 2.4.1 Unsolicited Commercial Email...................................................................................15 2.4.2 Chain Letters...............................................................................................................16 2.4.3 Scams...........................................................................................................................16 2.4.4 Malware Proliferation..................................................................................................17 2.4.5 Phishing Attacks..........................................................................................................17 iv 2.5 Proposed Solutions..............................................................................................................18 2.5.1 Keywords, Phrases and Regular Expressions..............................................................18 2.5.2 Blacklists.....................................................................................................................18 2.5.3 Whitelists.....................................................................................................................19 2.5.4 Heuristics.....................................................................................................................19 2.5.5 Machine Classifiers.....................................................................................................20 2.5.6 Challenge and Response Systems...............................................................................22 2.5.7 Greylisting...................................................................................................................23 2.5.8 Tarpitting.....................................................................................................................24 2.5.9 Sender Policy Framework (SPF).................................................................................25 2.5.10 Sender ID...................................................................................................................26 2.5.11 Relay Header Detection.............................................................................................27 2.5.12 DomainKeys Identified Email...................................................................................27 2.5.13 DMARC....................................................................................................................28 2.5.14 Legislation.................................................................................................................28 2.5.15 SMTP Proxies............................................................................................................29 2.6 Current State of the Commercial Industry..........................................................................30 2.6.1 Barracuda Networks....................................................................................................31 2.6.2 MessageLabs...............................................................................................................31 2.6.3 Google Gmail..............................................................................................................32 2.6.4 MailRoute....................................................................................................................33 2.7 Conclusions on the Literature Review................................................................................33 3 Research Procedures.................................................................................................................34 v 3.1 Finding Room for Improvement.........................................................................................34 3.2 Overall Approach................................................................................................................35

Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers

Technical and Legal Approaches to Unsolicited Electronic Mail, 35 USFL Rev

Learning Apache Mahout Classification Table of Contents

Linux VDS Release Notes

A Rule Based Approach for Spam Detection

Successful Non-Governmental Threat Attribution

Return of Organization Exempt from Income

Technical and Legal Approaches to Unsolicited Electronic Mail†

Glossary of Spam Terms

Image Spam Detection: Problem and Existing Solution

Mastering Spam a Multifaceted Approach with the Spamato Spam Filter System

Suse Linux Enterprise Server 10 Mail Scanning Gateway Build Guide

Curriculum Vitae Von Bastian Nolte (Kurzversion)