Recognition of Phishing Attacks Utilizing Anomalies in Phishing Websites

Recognition of phishing attacks utilizing anomalies in phishing websites Sunil Chaudhary University of Tampere Department of Computer Sciences Computer Science/Software Development M.Sc. thesis Supervisor: Eleni Berki November 2012 i University of Tampere Department of Computer Sciences Computer Science /Software Development Sunil Chaudhary: Recognition of phishing attacks utilizing anomalies in phishing websites M.Sc. Thesis, 78 pages, 15 index and appendix pages November 2012 The fight against phishing has resulted in several anticipating phishing prevention techniques. However, they are only partially able to address the phishing problem. There are still a large number of Internet users who are tricked to disclose their personal information to fake websites every day. This might be because existing phishing prevention techniques are either not foolproof or they are unable to deal with the emerging changes in phishing. The main purpose of this thesis is to identify anomalies that can be found in the Uniform Resource Locators (URLs) and source codes of phishing websites and determine an efficient way to employ those anomalies for phishing detection. In order to do that, I performed the meta-analysis of several existing phishing prevention techniques, specifically heuristic methods. Then, I selected forty-one anomalies, which can be found in the URLs and sources codes of phishing websites and are also mentioned or utilized by the past studies. This is followed by the verification of those anomalies using an experiment conducted on twenty online phishing websites. The study revealed that some anomalies, which were once significant for phishing detection, are no longer included in present day phishing websites, and several anomalies are also widely present in legitimate websites. Such ambiguous anomalies need further analysis to determine their significance in phishing detection. Moreover, it was also found that several heuristic methods use an insufficient set of anomalies which introduces inaccuracy in their results. Finally, in order to design an efficient heuristic method employing anomalies that can be found in URLs and source codes of phishing websites, it is suggested to give due priority to the anomalies that are: difficult for phishers to bypass, only found in phishing websites, seriously harmful, independent of other anomalies, and do not consume a lot of time for evaluation. Key words and terms: phishing, phishing prevention, URL, DOM objects, whitelist, blacklist, heuristics, meta-analysis, software quality. ii Acknowledgement I would like to express my sincere thanks and deep appreciation to my professor and supervisor Eleni Berki for her guidance and valuable comments. I am equally thankful to Marko Helenius (Tampere University of Technology) for the constructive feedback. I would also like to thank Linfeng Li for sharing his experiences on phishing research and suggesting various useful materials that I used for my thesis. My sincere thanks also go to my English teachers, Robert Hollingsworth and Julie Rajala who helped me to get familiar with the rules of academic writing. I would also like to thank to my professors Jyrki Nummenmaa and Zheying Zhang as well as all the attendee of the seminar course entitled “Master’s Thesis Seminar in Sofware Development “for their suggestion and feedback. Last but not least, I am thankful to my professor Mikko Ruohonen who provided me summer traineeship and ample freedom to complete a large part of my thesis during the traineeship period. Sunil Chaudhary 2nd December 2012, Tampere iii Contents 1.Introduction ................................................................................................................... 1 1.1.The phishing epidemic ........................................................................................ 1 1.2.Research questions ............................................................................................. 5 1.3.Anomalies in phishing websites are suitable for phishing detection ................... 6 1.4.Thesis contribution .............................................................................................. 7 1.5.Thesis outline............... ....................................................................................... 8 2.Review of phishing prevention methods ....................................................................... 8 2.1.Meaning of phishing prevention methods ........................................................... 8 2.2.Important factors for effective phishing prevention methods .............................. 9 2.2.1. Phishers’ behavior and phishing techniques ....................................... 10 2.2.2.Internet users behavior and decision making process .......................... 12 2.3.Objectives of existing phishing prevention methods ......................................... 14 2.3.1.Reasons behind internet users’ tendency to fall for phishing ............... 15 2.3.2.Design techniques to educate and aware about phishing ..................... 16 2.3.3.Design effective UI and warning to alert about phishing ................... 18 2.3.4.Development of countermeasure to automatically detect phishing ...... 20 2.3.5.Evaluate the effectiveness of existing phishing prevention methods ... 22 2.3.6.The need to invent proactive strategies for phishing prevention .......... 24 2.4.Classification of phishing prevention techniques .............................................. 28 2.5.Phishing prevention applications ....................................................................... 30 3.Analysis of strength and limitations of technical phishing prevention methods ......... 34 3.1.List based methods ............................................................................................ 34 3.1.1.Whitelist method .................................................................................. 34 3.1.2.Blacklist method ................................................................................... 36 3.2.Heuristic methods .............................................................................................. 40 3.2.1.Use of visual similarity measures in phishing detection ...................... 40 3.2.2.Use of search engine in phishing detection .......................................... 46 3.2.3.Use of anomalies in phishing websites for phishing detection ............ 50 4.Investigating anomalies in phishing websites .............................................................. 55 4.1.Anomalies found in the URLs of phishing websites ......................................... 56 4.2.Anomalies found in the source codes of phishing websites .............................. 62 4.3.Verification of anomalies using online phishing websites ................................ 66 4.4. Discussion on findings ..................................................................................... 70 5.Conclusions ................................................................................................................. 75 6.Limitations and future development work ................................................................... 78 References ...................................................................................................................... 79 Appendix ........................................................................................................................ 86 1 1. Introduction 1.1. The phishing epidemic Online services are an integral part of modern society. They make information readily accessible from any place through the Internet. This feature is equally utilized by both service providers and users. Service providers are able to penetrate and cover large markets easily at a low operational cost whilst users are able to choose from a wide range of services and are able to use them regardless of time and location. Unfortunately, these services too have not spared the attentions of cybercriminal. One of the major drawbacks of using such services is the risk of phishing. Phishing is a fraudulent activity carried out using an electronic communication to acquire personal information for malicious purposes. This information can include bank or financial institution authentication credentials, social security numbers, credit card details, and online shopping account information with which phishers usually defraud their victims. Phishers employ a number of techniques, such as social engineering scheme and technical subterfuge [APGW, 2012] in order to allure potential victims and make them divulge their account details and other susceptible information. (i) Social engineering scheme. In general, phishers use emails masquerading as being from a legitimate and trustworthy source, such as a bank, or an auction site, or an online commerce site [APGW, 2012] and redirect victims to an authentic looking counterfeit website to deceive recipients into disclosing sensitive information. Many other mediums, such as snail mail, phone call, and instant messenger are also used to reach the potential victims and lure them to disclose their confidential information. However, fake emails and phony websites are easy and economically viable means to target a large number of potential victims at a time which also might be a reason they are widely used to conduct phishing. The fake emails and phony websites used by phishers have evolved to become technically deceptive and hard for casual detection methods to detect. Phishing emails often create a sense of urgency to motivate

Recognition of Phishing Attacks Utilizing Anomalies in Phishing Websites

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support