Twitter Malware Collection System: an Automated URL Extraction and Examination Platform Benjamin B

Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 9-15-2011 Twitter Malware Collection System: An Automated URL Extraction and Examination Platform Benjamin B. Kuhar Follow this and additional works at: https://scholar.afit.edu/etd Part of the Computer and Systems Architecture Commons, and the Digital Communications and Networking Commons Recommended Citation Kuhar, Benjamin B., "Twitter Malware Collection System: An Automated URL Extraction and Examination Platform" (2011). Theses and Dissertations. 1405. https://scholar.afit.edu/etd/1405 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. TWITTER MALWARE COLLECTION SYSTEM: AN AUTOMATED URL EXTRACTION AND EXAMINATION PLATFORM THESIS Benjamin B. Kuhar AFIT/GCO/ENG/11-07 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT/GCO/ENG/11-07 TWITTER MALWARE COLLECTION SYSTEM: AN AUTOMATED URL EXTRACTION AND EXAMINATION PLATFORM THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Master of Science Benjamin B. Kuhar, BS August 2011 APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED AFIT/GCO/ENG/11-07 Abstract As the world becomes more interconnected through various technological services and methods, the threat of malware is increasingly looming overhead. One avenue in particular that is examined in this research is the social networking service Twitter. This research develops the Twitter Malware Collection System (TMCS). This system gathers Uniform Resource Locators (URLs) posted on Twitter and scans them to determine if any are hosting malware. This scanning process is performed by a cluster of Virtual Machines (VMs) running a specified software configuration and the execution prevention system known as ESCAPE which detects malicious code. When a URL is detected by a TMCS VM instance to be hosting malware, a dump of the web browser used is created to determine what kind of malicious activity has taken place and also how this activity was allowed. After collecting over a period of 40 days, and processing a total of 466,237 URLs twice in two different configurations, one consisting of a vulnerable Windows XP SP2 setup and the other consisting of a fully patched and updated Windows Vista setup, a total of 2,989 dumps were created by TMCS based on the results generated by ESCAPE. iv Acknowledgments I would like to thank my good friends and family members both near and far. Without you, I don’t believe this research effort would have been possible. v Table of Contents Table of Contents ........................................................................................................................... vi List of Figures ................................................................................................................................ ix List of Tables ................................................................................................................................... x I. Introduction ................................................................................................................................ 1 1.1 Problem Background ............................................................................................................ 1 1.2 Goals ..................................................................................................................................... 2 1.3 Document Outline ................................................................................................................ 2 II. Literature Review ..................................................................................................................... 4 2.1 Malware Overview .............................................................................................................. 4 2.1.1 Defining Malware ........................................................................................................... 4 2.1.2 Trojan Horses ................................................................................................................ 5 2.1.3 Rogueware ..................................................................................................................... 5 2.2 Malware and Exploit Collection Systems ........................................................................... 5 2.2.1 Strider HoneyMonkey .................................................................................................. 6 2.2.2 SpyProxy ...................................................................................................................... 8 2.2.3 HoneyIM ...................................................................................................................... 9 2.2.4 Caffeine Monkey ........................................................................................................ 11 2.4 Payload Delivery Methods ................................................................................................ 13 2.4.2 Clickjacking ................................................................................................................ 13 2.4.3 Drive-By Download .................................................................................................... 14 2.5 Malware Delivery and Execution Prevention .................................................................... 15 2.5.1 Data Execution Prevention ......................................................................................... 15 2.5.1 Nozzle ......................................................................................................................... 16 2.5.2 Gatekeeper .................................................................................................................. 17 2.6 History of Twitter Vulnerabilities ...................................................................................... 18 2.6.1 SMS Authentication Vulnerability .............................................................................. 18 2.6.2 Clickjacking Vulnerability .......................................................................................... 19 2.6.3 XSS Worms ................................................................................................................. 19 2.6.4 MouseOver Vulnerability ............................................................................................ 19 2.7 Twitter’s Malware Countermeasures ................................................................................. 20 vi 2.7.1 Malicious URL Filtering ............................................................................................. 20 2.7.2 Additional Filtering after Bit.ly Partnership ................................................................ 20 2.8 Summary ............................................................................................................................ 20 III. Methodology ......................................................................................................................... 21 3.1 Problem Definition ............................................................................................................ 21 3.1.1 Goals ............................................................................................................................ 21 3.1.2 Approach ..................................................................................................................... 22 3.2 System Boundaries ............................................................................................................. 23 3.3 System Services .................................................................................................................. 24 3.3.1 Status Collection .......................................................................................................... 24 3.3.2 URL Extraction from Statuses ..................................................................................... 25 3.3.3 Storage of Extracted URLs .......................................................................................... 25 3.3.4 URL Unshortening ...................................................................................................... 25 3.3.5 URL Processing through the TEH module ................................................................. 26 3.4 Metrics ................................................................................................................................ 28 3.5 Parameters ......................................................................................................................... 28 3.5.1 System Parameters. .....................................................................................................

Load more