World-Wide Cloaking Phishing Websites Detection Ziji Guo Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2017 World-wide cloaking phishing websites detection Ziji Guo Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Engineering Commons Recommended Citation Guo, Ziji, "World-wide cloaking phishing websites detection" (2017). Graduate Theses and Dissertations. 16365. https://lib.dr.iastate.edu/etd/16365 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. World-wide cloaking phishing websites detection by Ziji Guo A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Computer Engineering Program of Study Committee: Yong Guan, Major Professor Que Long Gong Zhenqiang The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this thesis. The Graduate College will ensure this thesis is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2018 Copyright c Ziji Guo, 2018. All rights reserved. ii TABLE OF CONTENTS Page LIST OF TABLES . iii LIST OF FIGURES . iv ACKNOWLEDGEMENTS . .v ABSTRACT . vi CHAPTER 1. OVERVIEW . .1 1.1 Motivation . .1 1.2 Contribution . .3 1.3 Organization of the Manuscript . .4 CHAPTER 2. REVIEW OF LITERATURE . .5 2.1 Introduction . .5 2.2 Cloaking . .5 2.2.1 Cloaking Types . .6 2.3 Phishing . .7 2.3.1 Phishing Types . .7 2.3.2 Existing Phishing detections . .8 2.4 Tor network . 10 2.5 Criteria Review . 12 CHAPTER 3. TECHNICAL APPROACH . 13 3.1 Introduction . 13 3.2 Pre-setting . 14 3.3 Data Collection . 14 iii 3.3.1 Collecting URLs . 14 3.3.2 Loading URLs . 16 3.3.3 Classifier . 18 3.3.4 Identify Trustful Website . 19 3.4 Detect Cloaking Phishing Websites . 21 CHAPTER 4. EXPERIMENTAL EVALUATION . 26 4.1 Introduction . 26 4.1.1 Hypothesis . 26 CHAPTER 5. SUMMARY AND FUTURE WORK . 28 REFERENCES . 31 iv LIST OF TABLES Page Table 2.1 Cloaking Set . .6 Table 3.1 Keyword Set . 16 Table 3.2 Database Table . 17 Table 3.3 Relational Factor . 19 v LIST OF FIGURES Page Figure 2.1 How Tor Network Steps . 11 Figure 3.1 Multi-tiered Architectural Framework for Malicious Domain Investigation and Detection . 13 Figure 3.2 Procedure of Data Collection . 15 Figure 3.3 Google's Approach . 16 Figure 3.4 Phishing example 1 . 24 Figure 3.5 Phishing example 2 . 25 Figure 4.1 Phishing-Suspicious . 27 Figure 4.2 Suspicious-Total . 27 vi ACKNOWLEDGEMENTS First and foremost, I would like to take this opportunity to express my thanks to my committee chair, Yong Guan, and my committee members, Longe Que and Zhengqiang Gong, for their guidance and support throughout the course of this research. I would also like to thank my friends, colleagues and the department faculty staff, they have made me a wonderful and meaningful time in the Iowa State University. Finally, Thanks to my family for their encouragement and support. vii ABSTRACT There have been increasing levels of sophistication in the continuous battles between cyber criminals and law enforcement/cybersecurity practitioners. For example, Dark net operators often take advantage of the evasion techniques to hide their criminal activities, such as hosting illegal content, selling illegal materials, or terror-information exchanges. Web cloaking and fast-fluxing are the two common ones. Fast-fluxing constantly changes the host IP addresses and offers higher degree availability & robustness for malicious domain users. Web cloaking allows some dynamic web contents be sent at ordinary times, but different contents may be triggered by specific keywords on search engines or other geo-locations. Both contribute to the great challenges to cybersecurity and law enforcement practitioners, due to the fact that at the time of evidence collection, evidential data from the source may be simply not the same as that the evidence generated earlier for malicious purposes. Index Terms: Malicious Domains, Cloaking Technique, Fast-Fluxing, Search Engine, Network Security 1 CHAPTER 1. OVERVIEW In this paper, we will present new active probing-based schemes for detecting cloaking fast-flux malicious domains. In our prototype platform, we have integrated our schemes with the Tor system so that our query and evidence collection are anonymous and distributed, to avoid the detection of malicious domain-hosting servers. During the last 10 months, we have used this system to collect evidence data using six of the top ten worldwide search engines (e.g., Bing, Baidu, Ask, AOL, Lycos and Search). Using the collected data, we developed algorithmic data analytic solutions to extract and classify the malicious fast-fluxing and web-cloaking domains. Such effective evidence collection and analytic solutions will help law enforcement practitioners in their casework handling such malicious domains. 1.1 Motivation The origins of the Internet date back to the 1960s when people first began to build robust, fault-tolerant communications via computer networks. In the early 1990s, we saw the beginning of the transition to the modern Internet and the initiation of Web-based services, motivating more and more enterprise networks and personal computers to be connected to the Internet. By the late 2000s, Internet-based services and technologies had been incorporated into virtually every aspect of modern life, as an associated effect, many suspicious domains appeared on the Internet. For example, it has been reported that many active online Darknets (using Tor hidden services) as well as other illegal online underground markets have been discovered to be relevant to cybercrime. Phishing is another serious internet problem. Throughout the whole world, search engines perform about 120,000 searches per second, and in 2016 1,220,523 phishing websites were detected, a fully 65 percent increase over 2015, and the number of phishing attacks has been greatly increasing since the latter part of the 20th century. 2 Such attacks may be disguised as as spoofed email, fake websites, or even part of a user's web browser, and victims are usually tricked by attackers into unconsciously providing their personal information, allowing attackers to capture user names, passwords, banking accounts, credit card numbers, etc. Attackers steal billions of dollars per year using phishing attacks, and network security research studies focused on such attacks have developed a variety of detection systems, the most popular type of anti-phishing tool being a web browser plugin. These technologies, however, exhibit some serious drawbacks, and contemporary attackers make detecting phishing attack much harder by using web-cloaking techniques to perpetrate their online crimes. According to my study, while most contemporary cyber-criminal detection techniques are based on a black-list system, analyzing DNS traffic, domain information, etc., phishing attacks more closely resemble social engineering, and cloaking techniques use IP identification, user-agent, referer, JavaScripts and Geo-location methods. For cloaking anti-detection techniques, an intruder might monitor data downloading with typ- ical limits of only 15 MB, massive external scripts, and unauthorized redirects, but all proposed detection techniques seem to present some serious disadvantages. Additionally, nowadays attackers have developed and are using more advanced techniques, like web-cloaking to conduct their online criminal activities, makes detection of phishing or other malicious attacks even more difficult. With respect to cloaking attacks, also known as 'bait and switch', the very first study we've seen was that of Wu and Davidson[1]. A cloaking web can hide the origin or intentional content and nature of a website by delivering a different intended webpage contents to some selected specific user group based, for example, on location or age preference. Nowadays, while users are often accustomed to heavily relying on information asked of search engines like Google, Baidu, and Bing, we have seen more and more untrustful information and contents returned from such online sources. For example, many people use the Google search engine as a simple mechanism for finding a particular website and click it, and users still generally trust results from major search engines and are willing to believe that relevant and useful responses (results) about websites listed in the top-ranking positions are returned from the search engine, even though this may not be true. 3 This paper aims at detection and investigation of the kinds of domains that host malicious websites/contents appearing in the top-ranking list positions of search engine results. In this paper, we first identify and characterize the occurrence of suspicious interaction between malicious domains (for example phishing or cloaking-based illegal online drug seller domain) and search engine results, and then discuss possible effective countermeasures against them. 1.2 Contribution In this study, we first explore issues that deserve attention with respect to detecting and investi- gating malicious domain effects present in search engine results, considering the phishing attack and its variations

World-Wide Cloaking Phishing Websites Detection Ziji Guo Iowa State University

Adsense-Blackhat-Edition.Pdf

Clique-Attacks Detection in Web Search Engine for Spamdexing Using K-Clique Percolation Technique

Copyrighted Material

Information Retrieval and Web Search Engines

Cloak and Dagger: Dynamics of Web Search Cloaking

Identifying Javascript Skimmers on High-Value Websites

Intermediation in Open Display Advertising

Department of Computer Science and Technology, UTU 2015

Phishprint: Evading Phishing Detection Crawlers by Prior Profiling

Google Adsense Secrets 5.0.Pdf

Sunrise to Sunset: Analyzing the End-To-End Life Cycle and EEctiveness of Phishing Attacks at Scale

Phishfarm: a Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists