
AN INTROSPECTIVE BEHAVIOR BASED METHODOLOGY TO MITIGATE E-MAIL BASED THREATS BY MADHUSUDHANAN CHANDRASEKARAN THESIS Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering in the Graduate School of the State University of New York at Buffalo, 2009 Buffalo, New York © Copyright by Madhusudhanan Chandrasekaran, 2009 All Rights Reserved To my family. iii Abstract E-mail is touted as the backbone of present day communication. Despite its convenience and im- portance, existing e-mail infrastructure is not devoid of problems. The underlying e-mail protocols operate on the assumption that users would not abuse the privilege of sending messages to each other. This weakness in design is consistently taken advantage by attackers to carry out social engineering and security exploits on day-to-day e-mail users. As a result, three prominent e-mail based threats have surfaced, viz. (i) spam; (ii) phishing; and (iii) information leak. While spam e-mail classification has received a lot of attention in the recent years, the other two threats still loom at large. The main goal of this dissertation is to design and develop efficient behavior based classification techniques that help to address each of these threats in a piecemeal fashion. The first part of this dissertation attempts to tackle the problem of detecting phishing e-mails before they reach users’ inboxes. To begin with, shortcomings of existing spam filters toward clas- sifying phishing e-mails are highlighted. To overcome them, a customizable and usable spam filter (CUSP) that detects phishing e-mails from the absence of personalized user information contained in them is proposed. However, as solely relying on the presence of personalized information as the criteria to detect phishing e-mails is not entirely foolproof, a novel machine learning based classi- fier that separates phishing e-mails based on their underlying semantic behavior is proposed. Ex- perimentation on real word phishing and financial e-mail datasets demonstrates that the proposed methodology can detect phishing e-mails with over 90% accuracy while keeping false positive rate minimum. Also, feasibility of generating context-sensitive warnings that better educate the users about the ill-effects of phishing attacks is explored. Classification techniques that operate on features confined to the phishing e-mails’ body can be iv thwarted using simple obfuscation techniques, which substitute spurious content appearing in them with seemingly innocuous characters or images. To address such scenarios, the second part of this dissertation takes the classification process a step further to analyze the behavior and characteris- tics of Websites referred by URLs contained in e-mails. Specifically, a challenge-response based technique called PHONEY is proposed to detect phishing Websites based on their inability to dis- tinguish fake and genuine inputs apart. Experimental results based on evaluation on both “live” and “synthesized” phishing Websites reveal that PHONEY can detect almost of all the e-mails that link to live phishing Websites with zero false positives and minimal computation overhead. In a similar vein, this dissertation proposes a novel technique to identify spam e-mails by analyzing the content of the linked-to Websites. A combination of textual and structural features extracted from the linked-to Websites is supplied as input to five machine learning algorithms employed for the purpose of classification. Testing on live spam feeds reveal that the proposed technique can detect spam e-mails with over 95% detection rate, thereby exhibiting better performance than two popular open source anti-spam filters. Information leaks pose significant risk to users’ privacy. An information leak could reveal users’ browsing characteristics or sensitive material contained in their e-mail inboxes to attackers allowing them to launch more targeted social engineering attacks (e.g., spear phishing attacks). The third part of this dissertation focuses on addressing these two facets of information leaks, i.e., information leak trigerred by spyware and user by detailing out the limitations with the state-of-the- art detection techniques. In order to bring out the deficiencies in existing anti-spyware techniques, first, a new class of intelligent spyware that efficiently blends in with user activities to evade de- tection is proposed. As a defensive countermeasure, this dissertation proposes a novel randomized honeytoken based methodology that can separate normal and spyware activities with near perfect accuracy. Similarly, to detect inadvertent informational leaks caused by users sending misdirected e-mails to unintended recipient(s), this dissertation advances the existing bag-of-words based out- lier detection techniques by using a set of stylometric and linguistic features that better encapsulate the previously exchanged e-mails between the sender and the recipient. Experimentation on real v world e-mail corpus shows that the proposed technique detects over 78% of synthesized informa- tion leak outperforming other existing techniques. Another important point to be considered while devising specialized filters to address each of the e-mail based threat is the need to make them interoperable. For example, an e-mail sup- posedly sent from a financial domain, but having an URL referring to a domain blacklisted for spam is very likely a phishing e-mail. Identifying sources of attacks helps in developing attack agnostic solutions that block all sensitive communication from and to misbehaving nodes. From this perspective, this dissertation explores the feasibility of building a holistic framework that not only operates in conjunction with intrusion detection systems (IDS) to block incoming and outgo- ing traffic from and to misbehaving nodes, but also safeguard the underlying e-mail infrastructure from zero-day attacks. vi Acknowledgments My advisor, Dr. Shambhu Upadhyaya, deserves many thanks. Under his tutelage, I could trans- form my otherwise loose ideas into something concrete as this dissertation. From onset, he actively involved me in various research projects and meetings, which helped me in building and strength- ening my academic outlook. It is a great pleasure to have worked under him in the end. I would also like to thank my committee members – Dr. Hung Ngo and Dr. Sheng Zhong for their support and guidance. Dr. Ngo had been a committee member for my Masters thesis also. Dr. Ngo is a great teacher, and is ever willing to embrace new ideas and provide constructive criticisms. I am indebted for the advise he has given me on both professional and personal front. Taking a seminar and independent study under Dr. Zhong was a fruitful and invigorating experience. Every discussion with him was thorough and in-depth, always imparting something in the end. I would like to thank my mentor at Google, Dr. Arash Baratloo, for providing me the insight on transforming research grade ideas into complete tangible products. Despite his hectic schedule, he took time to read my papers and sit through my presentations to provide invaluable suggestions. In a similar vein, I would like to thank Dr. Richard Wasserman and Ms. Maureen (Cheshire)Dantzler for giving me an opportunity to work in the Transaction Risk Management (TRMS) department at Amazon Inc. It is where I managed to get a “sneak view” on anti-fraud life-cycle from detection to take down of fraudulent sellers in an industry setting. My stay at Buffalo has been a pleasant and productive experience. This, however, would not have been possible without the fun and frolic that I had with my labmates and housemates during the last several years. I thoroughly enjoyed engaging conversations and eat-out breaks I had with my friends including, Aarthie, Anusha, Ashish, Duc, Mohit, Murtuza, Ram, Sunu, Suranjan and vii Vinod. I would also like to thank Vidyaraman (Video) for agreeing to be my agreeable roommate both at home and school. On a personal note, I would like to thank my wife Anusuya. Her unwavering love and enduring support helped me paddle my way through murky situations. I hope that I have not dragged her too far in the process. I also thank her for helping me to proofread this dissertation without which it would not be in its current form. I would like to thank my grandparents, parents and brother for believing in me – even when I did not. My mother made me realize that there is much more to life than a “bookish” degree. I pray that I never wane the trust my family have bestowed upon me. Thanks to my cousins Vijay and Aarthi for being with me during the final stages of this dissertation. Last but not the least, I would like to thank all the faculty, staff, and students of computer science and engineering department with whom I interacted some point in time during my stay here. viii Table of Contents Abstract . iv Acknowledgments . vii Chapter 1 Introduction . 1 1.1 E-mail Communication . .1 1.2 What Makes Secure Communication Hard? . .3 1.2.1 Authentication . .3 1.2.2 Integrity . .4 1.2.3 Non-repudiation . .6 1.2.4 Problems with Existing E-mail Security Enforcement Schemes . .7 1.3 E-mail-based Security Threats . .9 1.4 Dissertation Scope . 11 1.5 Original Contributions . 15 1.6 Dissertation Outline . 16 Chapter 2 Background and Related Work . 18 2.1 Introduction . 18 2.1.1 Chapter Organization . 20 2.2 Phishing E-mail Detection . 21 2.2.1 Discussion . 24 2.3 Validating E-mails Through Referral Webpages Analysis . 24 2.3.1 Discussion . 27 2.4 Preventing Information leak in Emails . 28 2.4.1 Discussion . 30 2.5 Summary . 31 Chapter 3 Detection of Phishing E-mails Based on Structural and Linguistic Features 32 3.1 Introduction . 32 3.1.1 Contributions . 36 3.1.2 Chapter Organization . 37 3.2 Commonly Adopted Phishing Attack Vectors .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages184 Page
-
File Size-