Brad-Wardman-Dissertation1.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
A SERIES OF METHODS FOR THE SYSTEMATIC REDUCTION OF PHISHING by BRADLEY WARDMAN ANTHONY SKJELLUM, COMMITTEE CHAIR ALAN SPRAGUE CHENGCUI ZHANG JOSE NAZARIO JOSEPH POPINSKI A DISSERTATION Submitted to graduate faculty of the University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of Doctor of Philosophy BIRMINGHAM, ALABAMA 2011 Copyright by Brad Wardman 2011 A SERIES OF METHODS FOR THE SYSTEMATIC REDUCTION OF PHISHING BRADLEY WARDMAN COMPUTER AND INFORMATION SCIENCES ABSTRACT Phishing continues to expand as efforts to thwart attacks are ineffective and criminals behind these scams operate with apparent impunity. In order to address both issues, this research provides three steps towards the reduction of phishing: identifying phishing websites, collecting phishing evidence, and correlating the phishing incidents. The first step is to identify phishing websites automatically. Experimental results demonstrate that content-based algorithms can classify phishing websites with greater than 90% detection rates while maintaining low false-positive rates. Next, the development of custom software collects additional information and evidence about these phishing websites. In the final step, this research offers two novel algorithms to be employed as clustering metrics for phishing website content. The three steps in this research reduce phishing by blocking potential victims from the malicious content through email filters and browser-based toolbars, gathering evidence against the criminal(s) that is usable by incident investigators, and revealing relationships between phishing websites that can provide investigators with deeper knowledge of phishing activity and thus help to prioritize their apparently, limited resources. i DEDICATION To my loving wife and supportive family, thank you ii TABLE OF CONTENTS 1. Introduction ..................................................................................................................1 1.1 Introduction Statement ..........................................................................................1 1.2 Problem Statement ................................................................................................2 1.2.1 Victim Activities ............................................................................................8 1.2.2 Criminal Activities .......................................................................................12 1.3 Solutions ..............................................................................................................12 1.3.1 New Contributions to Victim Activity .........................................................13 1.3.2 New Contributions to Criminal Activity......................................................17 1.3.3 Reduction of Victim Website Compromise Through Identification of Vulnerable Applications ...................................................18 1.4 Overall Contributions ..........................................................................................19 1.5 Metrics .................................................................................................................21 1.6 Extensibility ........................................................................................................22 1.7 Outline of Dissertation ........................................................................................22 2. Related Work .............................................................................................................25 2.1 Why Fall for Phish? .............................................................................................25 2.2 User Education ....................................................................................................27 2.3 Blacklists & Toolbars ..........................................................................................28 2.4 Email-Based ........................................................................................................31 iii 2.5 URL-Based ..........................................................................................................36 2.6 Content-Based .....................................................................................................39 2.7 Content-Based and URL- Based .........................................................................41 2.7 Phishing Prosecutions .........................................................................................44 2.8 Phishing activity aggregation ..............................................................................46 3. Victim Activities ........................................................................................................48 3.1 Characterization of phishing victims ...................................................................48 3.2 Contributions to victim activities ........................................................................49 3.2.1 Content-based approaches ...........................................................................50 3.2.2 URL-based approaches ................................................................................50 3.3 Current Problems.................................................................................................51 3.3.1 Human verification and takedowns .............................................................51 3.3.2 Blacklists ......................................................................................................54 3.3.3 Fast Phish Detection through Automated Solutions ....................................56 3.4 Content-Based Solutions To Fast Phish Detection .............................................57 3.4.1 Initial Process ...............................................................................................57 3.4.2 Main Index Matching ...................................................................................58 3.4.3 Deep MD5 Matching ...................................................................................60 3.4.4 The UAB Phishing Data Mine .....................................................................70 3.4.5 Partial MD5 Matching .................................................................................74 iv 3.4.6 Deeper Analysis of file Differences .............................................................77 3.4.7 Content-Based Experiments .........................................................................81 3.5 Metrics ...............................................................................................................117 3.6 Summary ...........................................................................................................119 4. Criminal Activities ...................................................................................................121 4.1 Introduction to criminal activities .....................................................................121 4.2 Contributions to deterrence of criminal activities .............................................122 4.3 Gathering phishing evidence .............................................................................123 4.3.1 Phishing Kit Scraper .................................................................................123 4.3.2 Kit Email Extractor ...................................................................................124 4.3.3 Summary of gathering evidence ................................................................133 4.4 Clustering phishing websites .............................................................................133 4.4.1 Initial Deep MD5 Clustering .....................................................................134 4.4.2 Reeling in Big Phish with a Deep MD5 Net ..............................................134 4.4.3 SLINK-style Deep MD5 Clustering ..........................................................147 4.4.4 Attribution of phishing website to kits.......................................................151 4.4.5 Conclusions on Deep MD5 Clustering ......................................................152 4.4.6 Fingerprinting Files: New Tackle to Catch A Phisher ..............................152 4.5 Conclusions .......................................................................................................162 5. Future Work and Conclusions .................................................................................163 v 5.1 Contributions .....................................................................................................164 5.2 Future Work and Extensibility ..........................................................................167 5.2.1 Future Work ...............................................................................................168 5.2.2 Extensibility of Syntactical Fingerprinting ................................................175 5.3 Conclusions .......................................................................................................179 Appendix A – URL-Based Solutions to Phish Detection ................................................182 A.1 Initial Study – Identifying Vulnerable Websites by Analysis of Common Strings in Phishing URLs ..........................................................182 A.1.1 INTRODUCTION .....................................................................................182 A.1.2 Related Work .............................................................................................184 A.1.3 Method .......................................................................................................187